Robust Mixture Learning when Outliers Overwhelm Small Groups
Daniil Dmitriev, Rares-Darius Buhai, Stefan Tiegel, Alexander Wolters,, Gleb Novikov, Amartya Sanyal, David Steurer, Fanny Yang

TL;DR
This paper introduces a new algorithm for robustly estimating mixture means in the presence of many outliers, especially when outliers can mimic additional mixture components, improving guarantees over previous methods.
Contribution
The paper presents an order-optimal algorithm for list-decodable mixture learning that handles high outlier fractions and leverages mixture separation for improved accuracy.
Findings
Achieves near-optimal error guarantees for mixture means.
Handles high outlier fractions that can simulate extra components.
Effective even with non-separated mixtures, with strong guarantees for separated cases.
Abstract
We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight, much less is known when outliers may crowd out low-weight clusters - a setting we refer to as list-decodable mixture learning (LD-ML). In this case, adversarial outliers can simulate additional spurious mixture components. Hence, if all means of the mixture must be recovered up to a small error in the output list, the list size needs to be larger than the number of (true) components. We propose an algorithm that obtains order-optimal error guarantees for each mixture mean with a minimal list-size overhead, significantly improving upon list-decodable mean estimation, the only existing method that is applicable for LD-ML. Although improvements are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSocial and Economic Development in India · Bayesian Methods and Mixture Models · Survey Sampling and Estimation Techniques
MethodsBalanced Selection
