Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent
Ruichong Zhang

TL;DR
This paper introduces a new gradient-compatible distance function called Sliced Cramér 2-distance for learning Gaussian Mixture Models, enabling seamless integration with neural networks and offering theoretical guarantees.
Contribution
It derives a closed-form formula for univariate GMMs and proposes a new distance for multivariate GMMs that is easy to compute and compatible with gradient descent.
Findings
Closed-form formula for univariate GMMs.
Proposed Sliced Cramér 2-distance for multivariate GMMs.
Demonstrated effectiveness in distributional reinforcement learning.
Abstract
The learning of Gaussian Mixture Models (also referred to simply as GMMs) plays an important role in machine learning. Known for their expressiveness and interpretability, Gaussian mixture models have a wide range of applications, from statistics, computer vision to distributional reinforcement learning. However, as of today, few known algorithms can fit or learn these models, some of which include Expectation-Maximization algorithms and Sliced Wasserstein Distance. Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks. In this paper, we derive a closed formula of two GMMs in the univariate, one-dimensional case, then propose a distance function called Sliced Cram\'er 2-distance for learning general multivariate GMMs. Our approach has several advantages over many previous methods. First, it has a closed-form expression for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference
