Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Chen-Hao Chao; Chien Feng; Wei-Fang Sun; Cheng-Kuang Lee; Simon See,; Chun-Yi Lee

arXiv:2405.13629·cs.LG·October 29, 2024

Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See,, Chun-Yi Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel MaxEnt RL framework using Energy-Based Normalizing Flows that unifies policy evaluation and improvement into a single process, enabling efficient training and multi-modal action modeling.

Contribution

The paper proposes a new MaxEnt RL approach with Energy-Based Normalizing Flows that simplifies training and supports complex action distributions, outperforming existing methods.

Findings

01

Achieves superior performance on MuJoCo benchmarks

02

Supports multi-modal action distributions

03

Enables direct calculation of soft value functions

Abstract

Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance with the updated soft Q-function. In this paper, we introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow). This framework integrates the policy evaluation steps and the policy improvement steps, resulting in a single objective training process. Our method enables the calculation of the soft value function used in the policy evaluation target without Monte Carlo approximation. Moreover, this design supports the modeling of multi-modal action distributions while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ChienFeng-hub/meow
jaxOfficial

Videos

Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Anomaly Detection Techniques and Applications

MethodsNormalizing Flows