Distance Correlation Methods for Discovering Associations in Large Astrophysical Databases
Elizabeth Martinez-Gomez, Mercedes T. Richards, Donald St. P. Richards

TL;DR
This paper demonstrates that the distance correlation coefficient is a powerful tool for detecting complex, nonlinear associations in large astrophysical datasets, outperforming traditional methods like Pearson and maximal information coefficient.
Contribution
It introduces the application of the distance correlation coefficient to astrophysical data, showing its advantages in identifying nonlinear and independent relationships in large galaxy databases.
Findings
Distance correlation detects more associations than Pearson.
It outperforms maximal information coefficient in resolving data patterns.
Applicable across various galaxy types and redshifts.
Abstract
High-dimensional, large-sample astrophysical databases of galaxy clusters, such as the Chandra Deep Field South COMBO-17 database, provide measurements on many variables for thousands of galaxies and a range of redshifts. Current understanding of galaxy formation and evolution rests sensitively on relationships between different astrophysical variables; hence an ability to detect and verify associations or correlations between variables is important in astrophysical research. In this paper, we apply a recently defined statistical measure called the distance correlation coefficient which can be used to identify new associations and correlations between astrophysical variables. The distance correlation coefficient applies to variables of any dimension; it can be used to determine smaller sets of variables that provide equivalent astrophysical information; it is zero only when variables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
