Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Bozhou Li; Yushuo Guan; Haolin Li; Bohan Zeng; Yiyan Ji; Yue Ding; Pengfei Wan; Kun Gai; Yuanxing Zhang; Wentao Zhang

arXiv:2602.03510·cs.CV·February 4, 2026

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Bozhou Li, Yushuo Guan, Haolin Li, Bohan Zeng, Yiyan Ji, Yue Ding, Pengfei Wan, Kun Gai, Yuanxing Zhang, Wentao Zhang

PDF

Open Access

TL;DR

This paper introduces a multi-layer LLM feature weighting framework for diffusion transformers, improving text-image alignment and generative quality by dynamically organizing LLM hidden states across layers and time.

Contribution

It proposes a unified normalized convex fusion framework with lightweight gates, establishing depth-wise semantic routing as the most effective conditioning strategy for diffusion models.

Findings

01

Depth-wise semantic routing improves text-image alignment (+9.97 on GenAI-Bench)

02

Purely time-wise fusion can reduce visual fidelity

03

Trajectory-aware signals are crucial for robust time-dependent conditioning

Abstract

Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced semantic hierarchy across LLM layers and non-stationary denoising dynamics over both diffusion time and network depth. To better match the dynamic process of DiT generation and thereby enhance the diffusion model's generative capability, we introduce a unified normalized convex fusion framework equipped with lightweight gates to systematically organize multi-layer LLM hidden states via time-wise, depth-wise, and joint fusion. Experiments establish Depth-wise Semantic Routing as the superior conditioning strategy, consistently improving text-image alignment and compositional generation (e.g., +9.97 on the GenAI-Bench Counting task). Conversely, we find that purely time-wise fusion can paradoxically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Neural Network Applications