TL;DR
EnergyFlow is a novel framework that unifies generative modeling with inverse reinforcement learning, enabling reward recovery and policy generalization without adversarial training, demonstrated on manipulation tasks.
Contribution
The paper introduces EnergyFlow, a new method that combines denoising score matching with IRL, providing state-of-the-art imitation learning and reward extraction without adversarial methods.
Findings
EnergyFlow achieves state-of-the-art imitation performance.
It provides effective reward signals for downstream RL.
Structural constraints improve policy generalization.
Abstract
This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar energy function whose gradient is the denoising field. We establish that under maximum-entropy optimality, the score function learned via denoising score matching recovers the gradient of the expert's soft Q-function, enabling reward extraction without adversarial training. Formally, we prove that constraining the learned field to be conservative reduces hypothesis complexity and tightens out-of-distribution generalization bounds. We further characterize the identifiability of recovered rewards and bound how score estimation errors propagate to action preferences. Empirically, EnergyFlow achieves state-of-the-art imitation performance on various manipulation tasks while providing an effective reward signal for downstream reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
