Joint Learning of Energy-based Models and their Partition Function
Michael E. Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel

TL;DR
This paper introduces a new method for jointly learning energy-based models and their partition functions using neural networks, enabling tractable training and estimation in large discrete spaces without MCMC, with applications in multilabel classification and ranking.
Contribution
The paper proposes a novel joint learning framework for EBMs and their partition functions that is tractable and extends to Fenchel-Young losses, including sparsemax, in large discrete spaces.
Findings
Provides a tractable objective for learning EBMs without MCMC
Enables estimation of the partition function on unseen data
Demonstrates effectiveness on multilabel classification and ranking tasks
Abstract
Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks. However, learning EBMs by exact maximum likelihood estimation (MLE) is generally intractable, due to the need to compute the partition function (normalization constant). In this paper, we propose a novel formulation for approximately learning probabilistic EBMs in combinatorially-large discrete spaces, such as sets or permutations. Our key idea is to jointly learn both an energy model and its log-partition, both parameterized as a neural network. Our approach not only provides a novel tractable objective criterion to learn EBMs by stochastic gradient descent (without relying on MCMC), but also a novel means to estimate the log-partition function on unseen data points. On the theoretical side, we show that our approach recovers the optimal MLE solution when optimizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSparsemax
