Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR

Yuyang Zhang; Yang Hu; Bo Dai; Na Li

arXiv:2512.23870·cs.LG·January 1, 2026

Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR

Yuyang Zhang, Yang Hu, Bo Dai, Na Li

PDF

Open Access

TL;DR

This paper introduces a flow-based policy parameterization for max-entropy reinforcement learning, enhancing expressiveness and robustness, with a theoretical analysis and a case study on LQR problems showing optimal policy learning.

Contribution

It proposes a novel flow-based policy with an online flow matching update method and provides theoretical insights into sampling distribution effects.

Findings

01

The flow-based policy achieves high expressiveness.

02

The ISFM algorithm effectively learns optimal policies.

03

Theoretical analysis links sampling choices to learning efficiency.

Abstract

Soft actor-critic (SAC) is a popular algorithm for max-entropy reinforcement learning. In practice, the energy-based policies in SAC are often approximated using simple policy classes for efficiency, sacrificing the expressiveness and robustness. In this paper, we propose a variant of the SAC algorithm that parameterizes the policy with flow-based models, leveraging their rich expressiveness. In the algorithm, we evaluate the flow-based policy utilizing the instantaneous change-of-variable technique and update the policy with an online variant of flow matching developed in this paper. This online variant, termed importance sampling flow matching (ISFM), enables policy update with only samples from a user-specified sampling distribution rather than the unknown target distribution. We develop a theoretical analysis of ISFM, characterizing how different choices of sampling distributions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Robot Manipulation and Learning