Discrete Tilt Matching

Yuyuan Chen; Shiyi Wang; Peter Potaptchik; Jaeyeon Kim; Michael S. Albergo

arXiv:2604.18739·cs.LG·May 20, 2026

Discrete Tilt Matching

Yuyuan Chen, Shiyi Wang, Peter Potaptchik, Jaeyeon Kim, Michael S. Albergo

PDF

TL;DR

This paper introduces Discrete Tilt Matching (DTM), a likelihood-free fine-tuning method for masked diffusion large language models that improves training stability and performance on reasoning tasks.

Contribution

The paper proposes DTM, a novel likelihood-free fine-tuning approach for masked diffusion LLMs, with explicit objectives and control variates, enhancing training stability and task performance.

Findings

01

DTM improves training stability and prevents mode collapse.

02

Fine-tuning LLaDA-8B-Instruct with DTM enhances Sudoku and Countdown performance.

03

DTM remains competitive on MATH500 and GSM8K.

Abstract

Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.