Gaussian mixture layers for neural networks

Sinho Chewi; Philippe Rigollet; Yuling Yan

arXiv:2508.04883·cs.LG·August 8, 2025

Gaussian mixture layers for neural networks

Sinho Chewi, Philippe Rigollet, Yuling Yan

PDF

TL;DR

This paper introduces Gaussian mixture layers for neural networks, deriving training dynamics using Wasserstein gradient flows, and demonstrates their effectiveness and distinct behavior compared to traditional layers in classification tasks.

Contribution

It proposes a novel Gaussian mixture layer architecture with dynamics derived from Wasserstein gradient flows, expanding the mean-field theory approach to finite-width networks.

Findings

01

GM layers achieve comparable test performance to two-layer networks.

02

GM layers exhibit different dynamics than classical fully connected layers.

03

The approach bridges mean-field theory and practical neural network design.

Abstract

The mean-field theory for two-layer neural networks considers infinitely wide networks that are linearly parameterized by a probability measure over the parameter space. This nonparametric perspective has significantly advanced both the theoretical and conceptual understanding of neural networks, with substantial efforts made to validate its applicability to networks of moderate width. In this work, we explore the opposite direction, investigating whether dynamics can be directly implemented over probability measures. Specifically, we employ Gaussian mixture models as a flexible and expressive parametric family of distributions together with the theory of Wasserstein gradient flows to derive training dynamics for such measures. Our approach introduces a new type of layer -- the Gaussian mixture (GM) layer -- that can be integrated into neural network architectures. As a proof of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.