Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

Hanhan Zhou; Shamik Roy; Rashmi Gangadharaiah

arXiv:2605.10971·cs.LG·May 13, 2026

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

Hanhan Zhou, Shamik Roy, Rashmi Gangadharaiah

PDF

TL;DR

This paper introduces an adaptive intervention scheduling method for discrete diffusion language models, significantly improving control precision over uniform methods by targeting attribute formation stages.

Contribution

The authors propose a novel adaptive scheduler that enhances attribute steering in DLMs by focusing interventions on active formation stages, outperforming uniform strategies.

Findings

01

Adaptive scheduling improves steering accuracy across multiple tasks.

02

The method maintains high generation quality while achieving up to 93% control strength.

03

Adaptive approach outperforms baseline uniform intervention, especially in multi-attribute control.

Abstract

Discrete diffusion language models (DLMs) generate text by iteratively denoising all positions in parallel, offering an alternative to autoregressive models. Controlled generation methods for DLMs, imported from autoregressive models, apply uniform intervention at every denoising steps. We show this uniform schedule degrades quality, and the damage compounds when multiple attributes are steered jointly. To diagnose the failure, we train sparse autoencoders on four DLMs (124M-8B parameters) and find that different attributes commit on distinct schedules, varying in timing, sharpness, and magnitude. For instance, topic commits within the first 2\% of denoising, whereas sentiment emerges gradually over 20\% of the process. Consequently, uniform intervention wastes steering capacity on steps where the target attribute has already solidified or has yet to emerge. We propose a novel adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.