Flow Matching for Offline Reinforcement Learning with Discrete Actions
Fairoz Nower Khan, Nabuat Zaman Nahim, Ruiquan Huang, Haibo Yang, Peizhong Ju

TL;DR
This paper extends flow matching techniques to discrete action spaces in offline reinforcement learning, enabling multi-objective and multi-agent applications with theoretical guarantees and strong empirical performance.
Contribution
It introduces a discrete flow matching framework using Markov chains, supporting multi-objective and multi-agent settings, with theoretical recovery guarantees and practical improvements.
Findings
Outperforms traditional offline RL methods in diverse benchmarks.
Supports high-dimensional control, multi-agent games, and changing objectives.
Can be applied to continuous-control problems via action quantization.
Abstract
Generative policies based on diffusion models and flow matching have shown strong promise for offline reinforcement learning (RL), but their applicability remains largely confined to continuous action spaces. To address a broader range of offline RL settings, we extend flow matching to a general framework that supports discrete action spaces with multiple objectives. Specifically, we replace continuous flows with continuous-time Markov chains, trained using a Q-weighted flow matching objective. We then extend our design to multi-agent settings, mitigating the exponential growth of joint action spaces via a factorized conditional path. We theoretically show that, under idealized conditions, optimizing this objective recovers the optimal policy. Extensive experiments further demonstrate that our method performs robustly across diverse settings and benchmarks, including high-dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
