Entropy-Regularized Adjoint Matching for Offline Reinforcement Learning

Abdelghani Ghanem; Mounir Ghogho

arXiv:2605.06156·cs.LG·May 11, 2026

Entropy-Regularized Adjoint Matching for Offline Reinforcement Learning

Abdelghani Ghanem, Mounir Ghogho

PDF

TL;DR

This paper introduces ME-AM, a novel offline RL framework that enhances policy expressivity and exploration by integrating entropy maximization and a mixture behavior prior within a flow-matching model.

Contribution

It proposes a unified approach combining entropy regularization and a mixture prior to overcome support and bias limitations in flow-based offline RL methods.

Findings

01

ME-AM outperforms existing methods on sparse-reward continuous control tasks.

02

The entropy mechanism reduces popularity bias, enabling better policy extraction.

03

The mixture prior broadens support, improving exploration in out-of-distribution regions.

Abstract

Integrating expressive generative policies, such as flow-matching models, into offline reinforcement learning (RL) allows agents to capture complex, multi-modal behaviors. While Q-learning with Adjoint Matching (QAM) stabilizes policy optimization via the continuous adjoint method, it remains inherently bound to the fixed behavior distribution. This dependence induces a \textit{popularity bias} that can suppress high-reward actions in low-density regions, and creates a \textit{support binding} that restricts off-manifold exploration. Existing workarounds, such as appending \textit{residual} Gaussian policies, often re-introduce the expressivity bottlenecks associated with unimodal distributions. In this work, we propose \textit{Maximum Entropy Adjoint Matching} (ME-AM), a unified framework that addresses these limitations within the continuous flow formulation. ME-AM incorporates two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.