Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling

Xubin Zhou; Yipeng Yang; Zhan Li

arXiv:2604.09159·cs.LG·April 13, 2026

Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling

Xubin Zhou, Yipeng Yang, Zhan Li

PDF

TL;DR

The paper introduces TRFP, a new policy framework for MaxEnt RL that effectively models multimodal actions, enabling stable training and efficient one-step sampling, with strong empirical performance.

Contribution

TRFP offers a hybrid deterministic-stochastic architecture that makes entropy-regularized optimization tractable and supports stable, efficient one-step sampling in generative policies.

Findings

01

TRFP captures multimodal behavior effectively in benchmarks.

02

Outperforms strong baselines on most MuJoCo benchmarks.

03

Remains competitive under one-step sampling.

Abstract

Maximum entropy reinforcement learning (MaxEnt RL) has become a standard framework for sequential decision making, yet its standard Gaussian policy parameterization is inherently unimodal, limiting its ability to model complex multimodal action distributions. This limitation has motivated increasing interest in generative policies based on diffusion and flow matching as more expressive alternatives. However, incorporating such policies into MaxEnt RL is challenging for two main reasons: the likelihood and entropy of continuous-time generative policies are generally intractable, and multi-step sampling introduces both long-horizon backpropagation instability and substantial inference latency. To address these challenges, we propose Truncated Rectified Flow Policy (TRFP), a framework built on a hybrid deterministic-stochastic architecture. This design makes entropy-regularized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.