Approximate Computation and Implicit Regularization for Very Large-scale   Data Analysis

Michael W. Mahoney

arXiv:1203.0786·cs.DS·March 6, 2012·5 cites

Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

Michael W. Mahoney

PDF

Open Access

TL;DR

This paper explores how approximate computation can implicitly provide statistical regularization, bridging the gap between algorithmic and statistical data analysis for large-scale data.

Contribution

It demonstrates, through theoretical and empirical case studies, that approximate algorithms can serve as implicit regularizers, enhancing scalability and predictive accuracy.

Findings

01

Approximate computation can implicitly regularize noisy data.

02

Scalable algorithms can also improve inference and prediction.

03

Empirical case studies support the theoretical insights.

Abstract

Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. In this article, I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques