Extending Sheldon M. Ross's Method for Efficient Large-Scale Variance Computation
Jiawen Li

TL;DR
This paper presents Prior Knowledge Acceleration (PKA), a batch-update method for variance that leverages previous computations for faster updates, outperforming traditional methods especially with large prior datasets.
Contribution
The paper introduces PKA, a novel batch-update variance method with a runtime analysis, explicit acceleration factor, and generalization to covariance and other statistics.
Findings
PKA achieves up to 454x speedup on large datasets.
Theoretical analysis shows when batch updating outperforms naive methods.
Benchmarks confirm PKA's efficiency on synthetic and real data.
Abstract
We introduce Prior Knowledge Acceleration (PKA), a batch-update method for variance that reuses previously computed sufficient statistics to avoid full recomputation. The update identity is algebraically equivalent to the pairwise formula of Chan, Golub, and LeVeque (1983); our contribution is a runtime-cost analysis that derives an explicit acceleration factor and identifies the data-size regime where batch updating outperforms both na\"ive recomputation and Ross's single-sample method. We prove that Ross's approach is preferable only when the new batch contains a single sample (). We further generalise the framework to covariance and other decomposable statistics. Benchmarks against Welford, Chan pairwise, and na\"ive two-pass baselines on synthetic and real-world streaming data confirm the theoretical predictions, with speedups of up to when the prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
