Decision Flow Policy Optimization
Jifeng Hu, Sili Huang, Siyuan Guo, Zhaogeng Liu, Li Shen, Lichao Sun, Hechang Chen, Yi Chang, Dacheng Tao

TL;DR
This paper introduces Decision Flow, a unified framework that integrates multi-modal action distribution modeling with policy optimization in reinforcement learning, leading to improved performance in offline RL tasks.
Contribution
The paper proposes Decision Flow, a novel method that seamlessly combines flow-based generative models with policy optimization for reinforcement learning.
Findings
Achieves or matches state-of-the-art performance in offline RL environments.
Effectively models complex multi-modal action distributions.
Demonstrates superior control in continuous action spaces.
Abstract
In recent years, generative models have shown remarkable capabilities across diverse fields, including images, videos, language, and decision-making. By applying powerful generative models such as flow-based models to reinforcement learning, we can effectively model complex multi-modal action distributions and achieve superior robotic control in continuous action spaces, surpassing the limitations of single-modal action distributions with traditional Gaussian-based policies. Previous methods usually adopt the generative models as behavior models to fit state-conditioned action distributions from datasets, with policy optimization conducted separately through additional policies using value-based sample weighting or gradient-based updates. However, this separation prevents the simultaneous optimization of multi-modal distribution fitting and policy improvement, ultimately hindering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
