Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means
Mikael M{\o}ller H{\o}gsgaard, Andrea Paudice

TL;DR
This paper analyzes the Median of Means estimator's effectiveness for heavy-tailed data, providing new bounds and applications to clustering and regression with unbounded inputs.
Contribution
It introduces a novel symmetrization technique to derive sample complexity bounds for mean estimation under weak moment conditions.
Findings
New sample complexity bounds for MoM in heavy-tailed settings
Improved results for k-means clustering with unbounded data
Enhanced linear regression methods with general losses
Abstract
The Median of Means (MoM) is a mean estimator that has gained popularity in the context of heavy-tailed data. In this work, we analyze its performance in the task of simultaneously estimating the mean of each function in a class when the data distribution possesses only the first moments for . We prove a new sample complexity bound using a novel symmetrization technique that may be of independent interest. Additionally, we present applications of our result to -means clustering with unbounded inputs and linear regression with general losses, improving upon existing works.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistical Methods and Inference · Statistical Distribution Estimation and Applications
MethodsLinear Regression
