Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models
Sumeet Singh, Stephen Tu, Vikas Sindhwani

TL;DR
This paper demonstrates that energy-based models can be effectively trained as policies for robotic tasks, challenging previous beliefs about their impracticality, and shows they can outperform diffusion models in complex benchmarks.
Contribution
We develop a practical, asymptotically consistent training algorithm for energy-based models as policies, incorporating ranking noise contrastive estimation and learnable negative sampling.
Findings
Energy-based models can be trained effectively for policy representation.
Our method outperforms diffusion models in multi-modal robotic benchmarks.
The proposed approach is mathematically justified and scalable.
Abstract
A crucial design decision for any robot learning pipeline is the choice of policy representation: what type of model should be used to generate the next set of robot actions? Owing to the inherent multi-modal nature of many robotic tasks, combined with the recent successes in generative modeling, researchers have turned to state-of-the-art probabilistic models such as diffusion models for policy representation. In this work, we revisit the choice of energy-based models (EBM) as a policy class. We show that the prevailing folklore -- that energy models in high dimensional continuous spaces are impractical to train -- is false. We develop a practical training objective and algorithm for energy models which combines several key ingredients: (i) ranking noise contrastive estimation (R-NCE), (ii) learnable negative samplers, and (iii) non-adversarial joint training. We prove that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
