DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction

Zhancun Mu

arXiv:2601.10471·cs.LG·January 21, 2026

DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction

Zhancun Mu

PDF

Open Access

TL;DR

DeFlow introduces a novel offline reinforcement learning framework that decouples manifold modeling from value maximization, enabling stable, efficient policy extraction without complex solver differentiation.

Contribution

It proposes a lightweight refinement module within an explicit trust region, bypassing solver differentiation and balancing loss terms, thus improving offline policy learning.

Findings

01

Outperforms existing methods on OGBench benchmark

02

Enables efficient offline-to-online policy adaptation

03

Maintains iterative expressivity of flow models

Abstract

We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis