Recovering Hidden Reward in Diffusion-Based Policies

Yanbiao Ji; Qiuchang Li; Yuting Hu; Shaokai Wu; Wenyuan Xie; Guodong Zhang; Qicheng He; Deyi Ji; Yue Ding; Hongtao Lu

arXiv:2605.00623·cs.RO·May 12, 2026

Recovering Hidden Reward in Diffusion-Based Policies

Yanbiao Ji, Qiuchang Li, Yuting Hu, Shaokai Wu, Wenyuan Xie, Guodong Zhang, Qicheng He, Deyi Ji, Yue Ding, Hongtao Lu

PDF

1 Repo

TL;DR

EnergyFlow is a novel framework that unifies generative modeling with inverse reinforcement learning, enabling reward recovery and policy generalization without adversarial training, demonstrated on manipulation tasks.

Contribution

The paper introduces EnergyFlow, a new method that combines denoising score matching with IRL, providing state-of-the-art imitation learning and reward extraction without adversarial methods.

Findings

01

EnergyFlow achieves state-of-the-art imitation performance.

02

It provides effective reward signals for downstream RL.

03

Structural constraints improve policy generalization.

Abstract

This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar energy function whose gradient is the denoising field. We establish that under maximum-entropy optimality, the score function learned via denoising score matching recovers the gradient of the expert's soft Q-function, enabling reward extraction without adversarial training. Formally, we prove that constraining the learned field to be conservative reduces hypothesis complexity and tightens out-of-distribution generalization bounds. We further characterize the identifiability of recovered rewards and bound how score estimation errors propagate to action preferences. Empirically, EnergyFlow achieves state-of-the-art imitation performance on various manipulation tasks while providing an effective reward signal for downstream reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sotaagi/EnergyFlow
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.