Robust Model Selection and Nearly-Proper Learning for GMMs
Jerry Li, Allen Liu, Ankur Moitra

TL;DR
This paper introduces a robust, efficient method for model selection in univariate Gaussian mixture models, capable of handling adversarial corruptions and estimating the number of components with provable guarantees.
Contribution
It presents the first polynomial-time algorithm for robustly estimating the number of components in GMMs with provable guarantees, even under adversarial noise.
Findings
Constructs a GMM with ~O(k) components approximating the distribution within ~O(ε)
Works with polynomially many samples, poly(k/ε) time complexity
Extends techniques to Fourier-sparse signal reconstruction
Abstract
In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees let alone ones that can tolerate adversarial corruptions. In this work, we study the problem of robust model selection for univariate Gaussian mixture models (GMMs). Given samples from a distribution that is -close in TV distance to a GMM with components, we can construct a GMM with components that approximates the distribution to within in time. Thus we are able to approximately determine the minimum number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Blind Source Separation Techniques
