TL;DR
This paper introduces flow map policies for fast, flexible action generation in complex control tasks, combining theoretical insights with practical algorithms to improve offline-to-online reinforcement learning performance.
Contribution
It proposes flow map policies for rapid action sampling, derives a new Q-guidance learning target, and develops a stochastic sampler for iterative inference, advancing offline-to-online RL.
Findings
FMQ outperforms previous methods with a 21.3% success rate improvement.
Flow map policies enable arbitrary jump actions, reducing inference latency.
The approach achieves state-of-the-art results across 12 robotic tasks.
Abstract
Generative policies based on expressive model classes, such as diffusion and flow matching, are well-suited to complex control problems with highly multimodal action distributions. Their expressivity, however, comes at a significant inference cost: generating each action typically requires simulating many steps of the generative process, compounding latency across sequential decision-making rollouts. We introduce flow map policies, a novel class of generative policies designed for fast action generation by learning to take arbitrary-size jumps including one-step jumps-across the generative dynamics of existing flow-based policies. We instantiate flow map policies for offline-to-online reinforcement learning (RL) and formulate online adaptation as a trust-region optimization problem that improves the critic's Q-value while remaining close to the offline policy. We theoretically derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
