TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search
Kaicheng Yang, Kaisen Yang, Baiting Wu, Xun Zhang, Qianrui Yang, Haotong Qin, He Zhang, Yulun Zhang

TL;DR
TreeQ introduces a novel framework for quantizing Diffusion Transformers efficiently, combining architecture-specific search, unified optimization, and information-preserving structures to enable near-lossless low-bit performance.
Contribution
The paper presents TreeQ, a comprehensive approach that advances DiT quantization through a tailored search method, unified guidance, and a new sparse branch design, achieving state-of-the-art low-bit results.
Findings
State-of-the-art performance on DiT-XL/2 with ultra-low-bit quantization.
First near-lossless 4-bit PTQ performance on DiT models.
Effective reduction of computational and memory overhead.
Abstract
Diffusion Transformers (DiTs) have emerged as a highly scalable and effective backbone for image generation, outperforming U-Net architectures in both scalability and performance. However, their real-world deployment remains challenging due to high computational and memory demands. Mixed-Precision Quantization (MPQ), designed to push the limits of quantization, has demonstrated remarkable success in advancing U-Net quantization to sub-4bit settings while significantly reducing computational and memory overhead. Nevertheless, its application to DiT architectures remains limited and underexplored. In this work, we propose TreeQ, a unified framework addressing key challenges in DiT quantization. First, to tackle inefficient search and proxy misalignment, we introduce Tree Structured Search (TSS). This DiT-specific approach leverages the architecture's linear properties to traverse the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Random lasers and scattering media · Transition Metal Oxide Nanomaterials
