Efficient algorithms for the sensitivities of the Pearson correlation coefficient and its statistical significance to online data
Marc Harary

TL;DR
This paper introduces efficient, closed-form algorithms to assess the maximum impact of new data on Pearson correlation and its significance, enabling real-time sensitivity analysis in streaming data contexts.
Contribution
It develops a rigorous theoretical framework and linear-time algorithms for measuring the maximal change in correlation and p-value due to additional data, with broad practical applications.
Findings
Closed-form solutions for correlation sensitivity
Linear- and constant-time algorithms for updates
Software implementation available for practical use
Abstract
Reliably measuring the collinearity of bivariate data is crucial in statistics, particularly for time-series analysis or ongoing studies in which incoming observations can significantly impact current collinearity estimates. Leveraging identities from Welford's online algorithm for sample variance, we develop a rigorous theoretical framework for analyzing the maximal change to the Pearson correlation coefficient and its p-value that can be induced by additional data. Further, we show that the resulting optimization problems yield elegant closed-form solutions that can be accurately computed by linear- and constant-time algorithms. Our work not only creates new theoretical avenues for robust correlation measures, but also has broad practical implications for disciplines that span econometrics, operations research, clinical trials, climatology, differential privacy, and bioinformatics.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence
