Improved Convergence Guarantees for Learning Gaussian Mixture Models by   EM and Gradient EM

Nimrod Segol; Boaz Nadler

arXiv:2101.00575·cs.LG·September 24, 2021

Improved Convergence Guarantees for Learning Gaussian Mixture Models by EM and Gradient EM

Nimrod Segol, Boaz Nadler

PDF

TL;DR

This paper provides sharper convergence analysis and improved sample complexity bounds for EM and gradient EM algorithms in learning Gaussian Mixture Models with known weights and identity covariance, under certain separation conditions.

Contribution

It offers a larger initialization region for convergence and reduces the sample size dependence from quadratic to logarithmic in the separation between components.

Findings

01

Convergence to global optima from larger initial regions.

02

Sample complexity depends logarithmically on component separation.

03

Improved theoretical guarantees for EM algorithms.

Abstract

We consider the problem of estimating the parameters a Gaussian Mixture Model with K components of known weights, all with an identity covariance matrix. We make two contributions. First, at the population level, we present a sharper analysis of the local convergence of EM and gradient EM, compared to previous works. Assuming a separation of $Ω (lo g K)$ , we prove convergence of both methods to the global optima from an initialization region larger than those of previous works. Specifically, the initial guess of each component can be as far as (almost) half its distance to the nearest Gaussian. This is essentially the largest possible contraction region. Our second contribution are improved sample size requirements for accurate estimation by EM and gradient EM. In previous works, the required number of samples had a quadratic dependence on the maximal separation between the K…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.