Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning
Quentin Rouxel (CUHK), Clemente Donoso, Fei Chen (CUHK), Serena Ivaldi, Jean-Baptiste Mouret

TL;DR
This paper introduces a novel goal-conditioned reinforcement learning method using Flow Matching to estimate distribution extrema, enabling humanoid robots to perform complex manipulation tasks from diverse demonstrations.
Contribution
It develops a new approach leveraging Flow Matching's properties to improve goal-conditioned imitation and reinforcement learning, validated on both benchmark and real humanoid robot tasks.
Findings
Effective in diverse demonstration scenarios
Successful deployment on humanoid robot for complex tasks
Improved performance over existing methods
Abstract
Imitation learning is a promising approach for enabling generalist capabilities in humanoid robots, but its scaling is fundamentally constrained by the scarcity of high-quality expert demonstrations. This limitation can be mitigated by leveraging suboptimal, open-ended play data, often easier to collect and offering greater diversity. This work builds upon recent advances in generative modeling, specifically Flow Matching, an alternative to Diffusion models. We introduce a method for estimating the minimum or maximum of the learned distribution by leveraging the unique properties of Flow Matching, namely, deterministic transport and support for arbitrary source distributions. We apply this method to develop several goal-conditioned imitation and reinforcement learning algorithms based on Flow Matching, where policies are conditioned on both current and goal observations. We explore and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
