TL;DR
This paper introduces Fisher Decorator, a geometric approach to refine flow policies in offline RL by using a local transport map and Fisher information, leading to improved performance.
Contribution
It proposes a novel anisotropic policy refinement method using a local transport map and Fisher information, addressing geometric mismatches in existing flow-based offline RL methods.
Findings
Achieves state-of-the-art results on offline RL benchmarks.
Addresses the geometric mismatch in policy regularization.
Provides a tractable anisotropic optimization framework.
Abstract
Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the regularization as an upper bound of the 2-Wasserstein distance (), which can be problematic in offline settings. This issue stems from a fundamental geometric mismatch: the behavioral policy manifold is inherently anisotropic, whereas the (or upper bound of ) regularization is isotropic and density-insensitive, leading to systematically misaligned optimization directions. To address this, we revisit offline RL from a geometric perspective and show that policy refinement can be formulated as a local transport map: an initial flow policy augmented by a residual displacement. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
