Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization
Ruijie Hao, Longfei Zhang, Yang Dai, Yang Ma, Xingxing Liang, Guangquan Cheng

TL;DR
This paper introduces FP-DRL, a novel reinforcement learning algorithm that models policies with flow matching and distributional return modeling, enabling better handling of multimodal solutions and achieving state-of-the-art results in MuJoCo tasks.
Contribution
It proposes a flow-based policy combined with distributional RL to capture complex, multimodal policies and improve performance in control tasks.
Findings
Achieves state-of-the-art performance on MuJoCo benchmarks.
Effectively models complex, multimodal policies.
Demonstrates superior representation capability of flow policies.
Abstract
Reinforcement Learning (RL) has proven highly effective in addressing complex control and decision-making tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution, which constrains the policy from capturing multimodal distributions, making it difficult to cover the full range of optimal solutions in multi-solution problems, and the return is reduced to a mean value, losing its multimodal nature and thus providing insufficient guidance for policy updates. In response to these problems, we propose a RL algorithm termed flow-based policy with distributional RL (FP-DRL). This algorithm models the policy using flow matching, which offers both computational efficiency and the capacity to fit complex distributions. Additionally, it employs a distributional RL approach to model and optimize the entire return distribution,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
