Loss minimization and parameter estimation with heavy tails

Daniel Hsu; Sivan Sabato

arXiv:1307.1827·cs.LG·April 19, 2016·40 cites

Loss minimization and parameter estimation with heavy tails

Daniel Hsu, Sivan Sabato

PDF

Open Access

TL;DR

This paper introduces a robust estimation technique effective under heavy-tailed distributions, enabling near-optimal parameter estimation for various models without requiring bounded or subgaussian data.

Contribution

It generalizes the median-of-means estimator to arbitrary metric spaces and applies it to minimize convex losses and estimate parameters in heavy-tailed settings.

Findings

01

Requires only O(d log(1/δ)) samples for near-optimal least squares estimation.

02

Applicable to sparse linear regression and low-rank covariance matrix estimation.

03

Does not assume bounded or subgaussian covariates or noise.

Abstract

This work studies applications and generalizations of a simple estimation technique that provides exponential concentration under heavy-tailed distributions, assuming only bounded low-order moments. We show that the technique can be used for approximate minimization of smooth and strongly convex losses, and specifically for least squares linear regression. For instance, our $d$ -dimensional estimator requires just $\tilde{O} (d lo g (1/ δ))$ random samples to obtain a constant factor approximation to the optimal least squares loss with probability $1 - δ$ , without requiring the covariates or noise to be bounded or subgaussian. We provide further applications to sparse linear regression and low-rank covariance matrix estimation with similar allowances on the noise and covariate distributions. The core technique is a generalization of the median-of-means estimator to arbitrary metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Statistical Methods and Inference · Distributed Sensor Networks and Detection Algorithms

MethodsLinear Regression