$2B$ or Not $2B$: A Tale of Three Algorithms for Streaming: Covariance Estimation after Welford and Chan-Golub-LeVeque
Felix Reichel

TL;DR
This paper compares three algorithms for streaming covariance estimation, analyzing their algebraic, numerical, and statistical properties, and introduces a conformal prediction framework for confidence sets in streaming data.
Contribution
It provides a unified foundation for three covariance algorithms, compares their finite-precision behavior, and introduces a distribution-free confidence set method for streaming covariance estimates.
Findings
Gram algorithm is fastest for batch computation.
Welford algorithm is most robust to large mean shifts.
CGL algorithm is best for distributed settings.
Abstract
We place three algorithms for computing the unbiased sample covariance matrix in streaming and distributed settings on a common algebraic, numerical, and statistical foundation. The Gram algorithm, derived from the variance reformulation, maintains the running cross-product matrix and the column-sum vector , yielding the unbiased covariance estimator in time per update. The Welford algorithm propagates a running mean and outer-product corrections , with updates and , achieving the same asymptotic cost with improved numerical stability under large data shifts. The Chan-Golub-LeVeque algorithm supports block-parallel merging through the exact identity $M = M_A + M_B + \frac{n_A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
