Dual Training of Energy-Based Models with Overparametrized Shallow   Neural Networks

Carles Domingo-Enrich; Alberto Bietti; Marylou Gabri\'e; Joan Bruna,; Eric Vanden-Eijnden

arXiv:2107.05134·cs.LG·February 16, 2022

Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks

Carles Domingo-Enrich, Alberto Bietti, Marylou Gabri\'e, Joan Bruna,, Eric Vanden-Eijnden

PDF

Open Access

TL;DR

This paper introduces a dual formulation for training energy-based models with overparametrized shallow neural networks, providing a theoretical foundation for a two-time-scale gradient ascent-descent algorithm and its variants, including score matching.

Contribution

It derives a dual variational principle for maximum likelihood EBMs with shallow neural networks, justifies a two-time-scale GDA training algorithm, and connects it to score matching.

Findings

01

GDA performs best when features and particles are updated on similar time scales.

02

Restarts of particles at data points relate to score matching.

03

Numerical experiments support the proposed training approach.

Abstract

Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is non-convex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow overparametrized neural network energies, both in the feature-learning and lazy linearized regimes. In the feature-learning regime, this dual formulation justifies using a two time-scale gradient ascent-descent (GDA) training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy. We also consider a variant of this algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference