Online Censoring for Large-Scale Regressions with Application to   Streaming Big Data

Dimitris Berberidis; Vassilis Kekatos; Georgios B. Giannakis

arXiv:1507.07536·stat.AP·June 29, 2016·IEEE Trans. Signal Process.

Online Censoring for Large-Scale Regressions with Application to Streaming Big Data

Dimitris Berberidis, Vassilis Kekatos, Georgios B. Giannakis

PDF

TL;DR

This paper develops online, data-adaptive algorithms for linear regression that efficiently omit less informative data points, reducing computational complexity while maintaining statistical accuracy, suitable for large-scale streaming data applications.

Contribution

It introduces novel stochastic approximation algorithms for censored observations with provable convergence, enabling adaptive data reduction in large-scale linear regression.

Findings

01

Algorithms achieve comparable accuracy with reduced data.

02

Adaptive censoring improves computational efficiency.

03

Validated on real and synthetic datasets.

Abstract

Linear regression is arguably the most prominent among statistical inference methods, popular both for its simplicity as well as its broad applicability. On par with data-intensive applications, the sheer size of linear regression problems creates an ever growing demand for quick and cost efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. The present paper introduces means of identifying and omitting "less informative" observations in an online and data-adaptive fashion, built on principles of stochastic approximation and data censoring. First- and second-order stochastic approximation maximum likelihood-based algorithms for censored observations are developed for estimating the regression coefficients. Online algorithms are also put forth to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.