EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation
Sravanth Kodavanti, Manjunath Arveti, Sowmya Vajrala, Srinivas Miriyala, Vikram N R

TL;DR
EdgeDiT introduces hardware-aware, lightweight diffusion transformers optimized for mobile NPUs, significantly reducing parameters, FLOPs, and latency while maintaining high-quality image synthesis for on-device deployment.
Contribution
The paper presents a systematic hardware-aware optimization framework that creates efficient diffusion transformer models tailored for mobile NPUs, enabling high-quality on-device image generation.
Findings
Achieves 20-30% parameter reduction
Reduces FLOPs by 36-46%
Decreases on-device latency by 1.65 times
Abstract
Diffusion Transformers (DiT) have established a new state-of-the-art in high-fidelity image synthesis; however, their massive computational complexity and memory requirements hinder local deployment on resource-constrained edge devices. In this paper, we introduce EdgeDiT, a family of hardware-efficient generative transformers specifically engineered for mobile Neural Processing Units (NPUs), such as the Qualcomm Hexagon and Apple Neural Engine (ANE). By leveraging a hardware-aware optimization framework, we systematically identify and prune structural redundancies within the DiT backbone that are particularly taxing for mobile data-flows. Our approach yields a series of lightweight models that achieve a 20-30% reduction in parameters, a 36-46% decrease in FLOPs, and a 1.65-fold reduction in on-device latency without sacrificing the scaling advantages or the expressive capacity of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
