Robust Model Selection and Nearly-Proper Learning for GMMs

Jerry Li; Allen Liu; Ankur Moitra

arXiv:2106.02774·cs.DS·April 25, 2023·1 cites

Robust Model Selection and Nearly-Proper Learning for GMMs

Jerry Li, Allen Liu, Ankur Moitra

PDF

Open Access 1 Video

TL;DR

This paper introduces a robust, efficient method for model selection in univariate Gaussian mixture models, capable of handling adversarial corruptions and estimating the number of components with provable guarantees.

Contribution

It presents the first polynomial-time algorithm for robustly estimating the number of components in GMMs with provable guarantees, even under adversarial noise.

Findings

01

Constructs a GMM with ~O(k) components approximating the distribution within ~O(ε)

02

Works with polynomially many samples, poly(k/ε) time complexity

03

Extends techniques to Fourier-sparse signal reconstruction

Abstract

In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees let alone ones that can tolerate adversarial corruptions. In this work, we study the problem of robust model selection for univariate Gaussian mixture models (GMMs). Given $poly (k / ϵ)$ samples from a distribution that is $ϵ$ -close in TV distance to a GMM with $k$ components, we can construct a GMM with $O (k)$ components that approximates the distribution to within $O (ϵ)$ in $poly (k / ϵ)$ time. Thus we are able to approximately determine the minimum number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Robust Model Selection and Nearly-Proper Learning for GMMs· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Blind Source Separation Techniques