Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
Jifeng Hu, Sili Huang, Zhejian Yang, Shengchao Hu, Li Shen, Hechang, Chen, Lichao Sun, Yi Chang, Dacheng Tao

TL;DR
The paper introduces AEPO, a novel method for offline reinforcement learning that uses analytic guidance to estimate energy functions, improving performance across multiple tasks.
Contribution
It provides a theoretical analysis and closed-form solution for energy-guided diffusion models, and develops a neural network approach for log-expectation estimation in offline RL.
Findings
AEPO outperforms baseline methods on D4RL benchmarks.
The method effectively estimates intractable energy functions.
Extensive experiments validate the approach's superiority.
Abstract
Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address this issue, we propose the Analytic Energy-guided Policy Optimization (AEPO). Specifically, we first provide a theoretical analysis and the closed-form solution of the intermediate guidance when the diffusion model obeys the conditional Gaussian transformation. Then, we analyze the posterior Gaussian distribution in the log-expectation formulation and obtain the target estimation of the log-expectation under mild assumptions. Finally, we train an intermediate energy neural network to approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTraffic control and management · Reinforcement Learning in Robotics · Elevator Systems and Control
MethodsDiffusion
