Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers
Bozhou Li, Yushuo Guan, Haolin Li, Bohan Zeng, Yiyan Ji, Yue Ding, Pengfei Wan, Kun Gai, Yuanxing Zhang, Wentao Zhang

TL;DR
This paper introduces a multi-layer LLM feature weighting framework for diffusion transformers, improving text-image alignment and generative quality by dynamically organizing LLM hidden states across layers and time.
Contribution
It proposes a unified normalized convex fusion framework with lightweight gates, establishing depth-wise semantic routing as the most effective conditioning strategy for diffusion models.
Findings
Depth-wise semantic routing improves text-image alignment (+9.97 on GenAI-Bench)
Purely time-wise fusion can reduce visual fidelity
Trajectory-aware signals are crucial for robust time-dependent conditioning
Abstract
Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced semantic hierarchy across LLM layers and non-stationary denoising dynamics over both diffusion time and network depth. To better match the dynamic process of DiT generation and thereby enhance the diffusion model's generative capability, we introduce a unified normalized convex fusion framework equipped with lightweight gates to systematically organize multi-layer LLM hidden states via time-wise, depth-wise, and joint fusion. Experiments establish Depth-wise Semantic Routing as the superior conditioning strategy, consistently improving text-image alignment and compositional generation (e.g., +9.97 on the GenAI-Bench Counting task). Conversely, we find that purely time-wise fusion can paradoxically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Neural Network Applications
