Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Lingkai Kong, Haichuan Wang, Tonghan Wang, Guojun Xiong, Milind Tambe

TL;DR
This paper introduces CompFlow, a novel reinforcement learning framework that leverages flow matching and optimal transport to better handle shifted-dynamics data, improving sample efficiency and robustness.
Contribution
CompFlow models online dynamics as a conditional flow based on offline data, providing a stable Wasserstein distance-based dynamics gap estimator and an active exploration strategy.
Findings
Outperforms strong baselines on shifted-dynamics RL benchmarks
Provides a stable Wasserstein distance-based dynamics gap estimator
Theoretically reduces the performance gap to the optimal policy
Abstract
Incorporating pre-collected offline data can substantially improve the sample efficiency of reinforcement learning (RL), but its benefits can break down when the transition dynamics in the offline dataset differ from those encountered online. Existing approaches typically mitigate this issue by penalizing or filtering offline transitions in regions with large dynamics gap. However, their dynamics-gap estimators often rely on KL divergence or mutual information, which can be ill-defined when offline and online dynamics have mismatched support. To address this challenge, we propose CompFlow, a principled framework built on the theoretical connection between flow matching and optimal transport. Specifically, we model the online dynamics as a conditional flow built upon the output distribution of a pretrained offline flow, rather than learning it directly from a Gaussian prior. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning
