Analytic Energy-Guided Policy Optimization for Offline Reinforcement   Learning

Jifeng Hu; Sili Huang; Zhejian Yang; Shengchao Hu; Li Shen; Hechang; Chen; Lichao Sun; Yi Chang; Dacheng Tao

arXiv:2505.01822·cs.LG·May 6, 2025

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Jifeng Hu, Sili Huang, Zhejian Yang, Shengchao Hu, Li Shen, Hechang, Chen, Lichao Sun, Yi Chang, Dacheng Tao

PDF

Open Access 1 Video

TL;DR

The paper introduces AEPO, a novel method for offline reinforcement learning that uses analytic guidance to estimate energy functions, improving performance across multiple tasks.

Contribution

It provides a theoretical analysis and closed-form solution for energy-guided diffusion models, and develops a neural network approach for log-expectation estimation in offline RL.

Findings

01

AEPO outperforms baseline methods on D4RL benchmarks.

02

The method effectively estimates intractable energy functions.

03

Extensive experiments validate the approach's superiority.

Abstract

Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address this issue, we propose the Analytic Energy-guided Policy Optimization (AEPO). Specifically, we first provide a theoretical analysis and the closed-form solution of the intermediate guidance when the diffusion model obeys the conditional Gaussian transformation. Then, we analyze the posterior Gaussian distribution in the log-expectation formulation and obtain the target estimation of the log-expectation under mild assumptions. Finally, we train an intermediate energy neural network to approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsTraffic control and management · Reinforcement Learning in Robotics · Elevator Systems and Control

MethodsDiffusion