Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow
Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See,, Chun-Yi Lee

TL;DR
This paper introduces a novel MaxEnt RL framework using Energy-Based Normalizing Flows that unifies policy evaluation and improvement into a single process, enabling efficient training and multi-modal action modeling.
Contribution
The paper proposes a new MaxEnt RL approach with Energy-Based Normalizing Flows that simplifies training and supports complex action distributions, outperforming existing methods.
Findings
Achieves superior performance on MuJoCo benchmarks
Supports multi-modal action distributions
Enables direct calculation of soft value functions
Abstract
Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function. In this paper, we introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow). This framework integrates the policy evaluation steps and the policy improvement steps, resulting in a single objective training process. Our method enables the calculation of the soft value function used in the policy evaluation target without Monte Carlo approximation. Moreover, this design supports the modeling of multi-modal action distributions while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Anomaly Detection Techniques and Applications
MethodsNormalizing Flows
