TL;DR
FlowIQN introduces a Wasserstein-aligned flow-matching critic for distributional reinforcement learning, improving return distribution accuracy and compatibility with DRL frameworks.
Contribution
It proposes a novel quantile-coupled flow matching critic with theoretical Wasserstein alignment guarantees, addressing a key metric mismatch in existing CFM critics.
Findings
FlowIQN outperforms other CFM critics in Wasserstein return-distribution accuracy.
It achieves competitive performance on offline RL benchmarks.
The method provides a theoretically grounded approach compatible with DRL pipelines.
Abstract
Unlike standard expected-return Reinforcement Learning (RL), Distributional RL (DRL) models the full return distribution, making it better-suited for uncertainty-aware and risk-sensitive decision-making. Conditional Flow Matching (CFM) critics have recently attracted attention for modelling continuous, multi-modal return distributions. Despite this interest, there remains a substantial metric mismatch: DRL theory relies on the distributional Bellman operator being contractive in the -Wasserstein distance, yet existing CFM critics are trained with arbitrary source-target couplings, so their flow-matching losses are not Wasserstein-aligned surrogates for matching Bellman target return distributions. In this work, we address this mismatch by proposing FlowIQN, a CFM critic that sorts source and Bellman target samples within each mini-batch to approximate the monotone optimal transport…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
