Distributional simplicity bias and effective convexity in Energy Based Models
Aur\'elien Decelle, Alfonso de Jes\'us Navas G\'omez, Beatriz Seoane

TL;DR
This paper analyzes energy-based models' training dynamics, revealing how they learn simpler, lower-order interactions first, explaining the distributional simplicity bias and fixed point behavior.
Contribution
It provides a dynamical analysis of energy-based learning, identifying fixed points and explaining the hierarchy in learning interactions of increasing order.
Findings
Gradient flow admits data-consistent and spurious fixed points.
Perturbations around data points are either stable or neutral.
Lower-order interactions are learned before higher-order ones.
Abstract
Energy-based learning is a powerful framework for generative modelling, but its training is inherently non-convex, leading potentially to sensitivity to initialisation, poor local optima, and unstable gradient dynamics. We present a dynamical analysis of energy-based learning through the lens of the effective model, which can be interpreted as either a generalised Ising model with higher-order interactions or the Fourier expansion of the energy. Under sufficient expressivity, we show that the gradient flow induced by learning strictly positive distributions over binary variables admits two types of fixed points: data-consistent points, which exactly reproduce the target distribution, and spurious points, which satisfy stationarity without matching the target distribution. Around data-consistent points, we show that perturbations are either stable or neutral, with neutral directions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
