Q-learning with Adjoint Matching

Qiyang Li; Sergey Levine

arXiv:2601.14234·cs.LG·May 20, 2026

Q-learning with Adjoint Matching

Qiyang Li, Sergey Levine

PDF

1 Repo 1 Video

TL;DR

Q-learning with Adjoint Matching (QAM) introduces a stable, unbiased method for optimizing expressive diffusion policies in continuous-action reinforcement learning, outperforming prior approaches on challenging tasks.

Contribution

QAM leverages adjoint matching to enable stable, gradient-based optimization of flow and diffusion policies, overcoming numerical instability issues in continuous RL.

Findings

01

QAM outperforms prior methods on sparse reward tasks

02

QAM provides unbiased, expressive policies at the optimum

03

QAM is effective in both offline and offline-to-online RL settings

Abstract

We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

colinqiyangli/qam
github

Videos

Q-Learning with Adjoint Matching· slideslive