Discrete Adjoint Matching
Oswin So, Brian Karrer, Chuchu Fan, Ricky T. Q. Chen, Guan-Horng Liu

TL;DR
This paper introduces Discrete Adjoint Matching (DAM), a novel method for fine-tuning discrete generative models like diffusion-based language models, by adapting continuous adjoint matching techniques to discrete settings.
Contribution
DAM is the first discrete variant of Adjoint Matching, enabling effective fine-tuning of discrete models using a new statistical estimator derived from the original continuous framework.
Findings
DAM outperforms baseline methods on synthetic tasks
DAM effectively handles discrete state spaces in language models
The approach opens new avenues for adjoint-based estimators in discrete domains
Abstract
Computation methods for solving entropy-regularized reward optimization -- a class of problems widely used for fine-tuning generative models -- have advanced rapidly. Among those, Adjoint Matching (AM, Domingo-Enrich et al., 2025) has proven highly effective in continuous state spaces with differentiable rewards. Transferring these practical successes to discrete generative modeling, however, remains particularly challenging and largely unexplored, mainly due to the drastic shift in generative model classes to discrete state spaces, which are nowhere differentiable. In this work, we propose Discrete Adjoint Matching (DAM) -- a discrete variant of AM for fine-tuning discrete generative models characterized by Continuous-Time Markov Chains, such as diffusion-based large language models. The core of DAM is the introduction of discrete adjoint-an estimator of the optimal solution to the…
Peer Reviews
Decision·ICLR 2026 Poster
- The authors provide a deep theoretical analysis to prove their method for discrete version of AM. They use fixed-point equations to prove that their practical algorithm is guaranteed to converge to the true, theoretically perfect optimal solution - The authors address a computationally impossible problem in their theoretically optimal solution. They then methodically build a practical solution: estimation via sampling and approximate the correction factor by sampling a few possible futures (K
- The algorithm requires $K$ model-forward passes per training step to build its estimator. While this is clearly effective on an 8B model, the cost for fine-tuning much larger models (e.g., 70B+) is not discussed. A small experiment reporting training time vs. final accuracy for DAM and D1 would make the paper's practical claims much stronger. - A valuable addition to the empirical analysis would be an ablation study on the number of samples K used in the importance-weighted estimator.
1, The motivation is clear and significant, locating at the need of reward-guided fine-tuning of discrete diffusion-based models. 2, The theoretical seems to be sound.
This seems to be a quite good paper. But I am not a theory expert. So I will be alert to any issues raised by other reviewers. Also, I want to raise a question about the performance of Llada-8b on GSM-8K. According to [A], the performance of base Llada model on GSM-8K is 80+. But in your paper, the performance is 60-70. Could you please explain this gap? Reference: [A] Revolutionizing Reinforcement Learning Framework for Diffusion Large
1. **Clear conceptual motivation:** The paper addresses a timely and well-motivated gap — extending adjoint-based optimization methods, previously limited to continuous diffusion models, to the discrete generative setting, which is crucial for language and symbolic models. 2. **Principled extension of Adjoint Matching:** DAM is a nontrivial discrete analogue of Adjoint Matching (AM), retaining its optimization-by-simulation philosophy while adapting it to the constraints of discrete-time, discre
1. **Clarity and depth of the theoretical exposition:** The theoretical development is solid and well-motivated, but occasionally dense. Some key derivations—particularly the transition from Dynkin’s formulation to the discrete adjoint system—could be presented with more intuition and interpretive discussion, to help the reader understand the underlying mechanics beyond the formal algebra. 2. **Limited discussion of importance sampling techniques:** The paper briefly introduces importance weight
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Formal Methods in Verification
