Predictive Data Calibration for Linear Correlation Significance Testing
Kaustubh R. Patil, Simon B. Eickhoff, Robert Langner

TL;DR
This paper introduces a machine-learning-based data calibration method to improve the accuracy of linear relationship significance testing, providing better $p$-values and correlation estimates, and potentially reducing the need for multiple testing corrections.
Contribution
It presents a novel data calibration approach that conditions samples on expected linear relationships, enhancing $p$-value interpretation and correlation estimation in significance testing.
Findings
Calibrated $p$-values can be interpreted as posterior probabilities.
The method improves correlation strength estimation under limited samples.
Empirical tests show advantages over traditional PCC-based significance testing.
Abstract
Inferring linear relationships lies at the heart of many empirical investigations. A measure of linear dependence should correctly evaluate the strength of the relationship as well as qualify whether it is meaningful for the population. Pearson's correlation coefficient (PCC), the \textit{de-facto} measure for bivariate relationships, is known to lack in both regards. The estimated strength maybe wrong due to limited sample size, and nonnormality of data. In the context of statistical significance testing, erroneous interpretation of a -value as posterior probability leads to Type I errors -- a general issue with significance testing that extends to PCC. Such errors are exacerbated when testing multiple hypotheses simultaneously. To tackle these issues, we propose a machine-learning-based predictive data calibration method which essentially conditions the data samples on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Research Topics · Statistical Methods in Clinical Trials · Advanced Statistical Modeling Techniques
MethodsTest
