Online Censoring for Large-Scale Regressions with Application to Streaming Big Data
Dimitris Berberidis, Vassilis Kekatos, Georgios B. Giannakis

TL;DR
This paper develops online, data-adaptive algorithms for linear regression that efficiently omit less informative data points, reducing computational complexity while maintaining statistical accuracy, suitable for large-scale streaming data applications.
Contribution
It introduces novel stochastic approximation algorithms for censored observations with provable convergence, enabling adaptive data reduction in large-scale linear regression.
Findings
Algorithms achieve comparable accuracy with reduced data.
Adaptive censoring improves computational efficiency.
Validated on real and synthetic datasets.
Abstract
Linear regression is arguably the most prominent among statistical inference methods, popular both for its simplicity as well as its broad applicability. On par with data-intensive applications, the sheer size of linear regression problems creates an ever growing demand for quick and cost efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. The present paper introduces means of identifying and omitting "less informative" observations in an online and data-adaptive fashion, built on principles of stochastic approximation and data censoring. First- and second-order stochastic approximation maximum likelihood-based algorithms for censored observations are developed for estimating the regression coefficients. Online algorithms are also put forth to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
