Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

Injin Kong; Hyoungjoon Lee; Yohan Jo

arXiv:2605.14368·cs.CL·May 15, 2026

Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

Injin Kong, Hyoungjoon Lee, Yohan Jo

PDF

TL;DR

This paper introduces DiHAL, a hybrid diffusion-transformer model that uses geometry-guided layer selection to improve language model denoising by replacing lower transformer layers with diffusion, avoiding direct token recovery.

Contribution

DiHAL proposes a novel method for integrating diffusion into pretrained transformers using geometry-based layer scoring and hidden-state reconstruction, improving diffusion effectiveness in language models.

Findings

01

Geometry score predicts effective diffusion layers.

02

Hidden-state recovery outperforms continuous diffusion baselines.

03

Shallow diffusion insertion improves language model denoising.

Abstract

Continuous diffusion language models lag behind autoregressive transformers, partly because diffusion is applied in spaces poorly suited to language denoising and token recovery. We propose DiHAL, a geometry-guided diffusion-transformer hybrid that asks where diffusion should enter a pretrained transformer. DiHAL scores layers with geometry-based proxies, selects a diffusion-friendly hidden-state interface, and replaces the lower transformer prefix with a diffusion bridge while retaining the upper layers and original LM head. By reconstructing the selected-layer hidden state rather than tokens, DiHAL avoids direct continuous-to-discrete recovery. Experiments on 8B-scale backbones show that the geometry score predicts effective shallow insertion layers under a fixed bridge-training protocol and that hidden-state recovery improves over continuous diffusion baselines in a diagnostic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.