Learning Mixture Density via Natural Gradient Expectation Maximization

Yutao Chen; Jasmine Bayrooti; Steven Morad

arXiv:2602.10602·cs.LG·February 12, 2026

Learning Mixture Density via Natural Gradient Expectation Maximization

Yutao Chen, Jasmine Bayrooti, Steven Morad

PDF

Open Access

TL;DR

This paper introduces nGEM, a natural gradient-based optimization method for mixture density networks, significantly improving convergence speed and scalability by leveraging information geometry and an EM framework.

Contribution

It develops the nGEM algorithm, integrating natural gradient descent with EM for mixture density networks, enhancing training efficiency and scalability.

Findings

01

nGEM achieves up to 10x faster convergence.

02

nGEM scales well to high-dimensional data.

03

It adds minimal computational overhead.

Abstract

Mixture density networks are neural networks that produce Gaussian mixtures to represent continuous multimodal conditional densities. Standard training procedures involve maximum likelihood estimation using the negative log-likelihood (NLL) objective, which suffers from slow convergence and mode collapse. In this work, we improve the optimization of mixture density networks by integrating their information geometry. Specifically, we interpret mixture density networks as deep latent-variable models and analyze them through an expectation maximization framework, which reveals surprising theoretical connections to natural gradient descent. We then exploit such connections to derive the natural gradient expectation maximization (nGEM) objective. We show that empirically nGEM achieves up to 10 $\times$ faster convergence while adding almost zerocomputational overhead, and scales well to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis