Robust machine learning by median-of-means : theory and practice
Guillaume Lecu\'e, Matthieu Lerasle

TL;DR
This paper introduces median-of-means estimators for robust machine learning that achieve optimal convergence rates even with outliers, and demonstrates their practical computability and outlier detection capabilities.
Contribution
The paper develops new median-of-means estimators with optimal convergence and robustness, and shows how to implement them efficiently in machine learning algorithms.
Findings
Estimators achieve optimal convergence rates under minimal assumptions.
Breakdown number scales with sample size and convergence rate.
Algorithms for LASSO can be adapted to produce robust estimators.
Abstract
We introduce new estimators for robust machine learning based on median-of-means (MOM) estimators of the mean of real valued random variables. These estimators achieve optimal rates of convergence under minimal assumptions on the dataset. The dataset may also have been corrupted by outliers on which no assumption is granted. We also analyze these new estimators with standard tools from robust statistics. In particular, we revisit the concept of breakdown point. We modify the original definition by studying the number of outliers that a dataset can contain without deteriorating the estimation properties of a given estimator. This new notion of breakdown number, that takes into account the statistical performances of the estimators, is non-asymptotic in nature and adapted for machine learning purposes. We proved that the breakdown number of our estimator is of the order of (number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
