TL;DR
The paper introduces Multiscale Graph Correlation (MGC), a dependence test that effectively detects relationships across diverse biological data types with fewer samples and offers insights into the underlying data geometry.
Contribution
MGC combines multiple data science techniques to improve dependence testing, reducing sample size requirements and providing interpretability in complex biological datasets.
Findings
Outperforms existing methods in high-dimensional and nonlinear scenarios
Requires fewer samples to achieve comparable statistical power
Provides insights into the latent geometric structure of data
Abstract
Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets. While existing approaches can test whether two properties are related, they often require unfeasibly large sample sizes in real data scenarios, and do not provide any insight into how or why the procedure reached its decision. Our approach, "Multiscale Graph Correlation" (MGC), is a dependence test that juxtaposes previously disparate data science techniques, including k-nearest neighbors, kernel methods (such as support vector machines), and multiscale analysis (such as wavelets). Other methods typically require double or triple the number samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships - spanning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
