Pair Correlation Factor and the Sample Complexity of Gaussian Mixtures

Farzad Aryan

arXiv:2508.03633·cs.LG·August 6, 2025

Pair Correlation Factor and the Sample Complexity of Gaussian Mixtures

Farzad Aryan

PDF

TL;DR

This paper introduces the Pair Correlation Factor (PCF), a new geometric measure that better predicts the sample complexity of learning Gaussian Mixture Models than previous gap-based metrics.

Contribution

The paper proposes the PCF as a novel geometric property influencing GMM sample complexity and provides an improved algorithm with tighter bounds in the spherical case.

Findings

01

PCF more accurately predicts sample complexity than minimum gap

02

New algorithm with improved sample complexity bounds in spherical GMMs

03

More than samples needed depending on PCF

Abstract

We study the problem of learning Gaussian Mixture Models (GMMs) and ask: which structural properties govern their sample complexity? Prior work has largely tied this complexity to the minimum pairwise separation between components, but we demonstrate this view is incomplete. We introduce the \emph{Pair Correlation Factor} (PCF), a geometric quantity capturing the clustering of component means. Unlike the minimum gap, the PCF more accurately dictates the difficulty of parameter recovery. In the uniform spherical case, we give an algorithm with improved sample complexity bounds, showing when more than the usual $ϵ^{- 2}$ samples are necessary.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.