Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

Injin Kong; Hyoungjoon Lee; Yohan Jo

arXiv:2601.14758·cs.LG·March 20, 2026

Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

Injin Kong, Hyoungjoon Lee, Yohan Jo

PDF

Open Access

TL;DR

This paper investigates the internal mechanism changes when autoregressive language models are post-trained into masked diffusion models, revealing a systematic shift in circuitry and processing strategies depending on task type.

Contribution

It provides a detailed circuit analysis showing how post-training reorganizes internal mechanisms, especially for global planning tasks, differing from traditional autoregressive models.

Findings

01

MDMs preserve autoregressive circuitry for local causal tasks

02

MDMs rewire and shift processing for global planning tasks

03

Semantic processing transitions from localized to distributed in MDMs

Abstract

Post-training pretrained autoregressive models (ARMs) into masked diffusion models (MDMs) has emerged as a cost-effective way to overcome the limitations of sequential generation. Yet the internal algorithmic changes induced by this shift remain poorly understood, leaving it unclear whether post-trained MDMs acquire genuine bidirectional reasoning or merely repackage autoregressive heuristics. We address this question through a comparative circuit analysis of ARMs and their MDM counterparts. Our analysis reveals a systematic "mechanism shift" that depends on the structural nature of the task. MDMs largely preserve autoregressive circuitry for tasks driven by local causal dependencies, but for global planning tasks they abandon initialized pathways and exhibit distinct rewiring with increased early-layer processing. At the semantic level, we observe a transition from sharp, localized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications