EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation

Sravanth Kodavanti; Manjunath Arveti; Sowmya Vajrala; Srinivas Miriyala; Vikram N R

arXiv:2603.28405·cs.CV·March 31, 2026

EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation

Sravanth Kodavanti, Manjunath Arveti, Sowmya Vajrala, Srinivas Miriyala, Vikram N R

PDF

TL;DR

EdgeDiT introduces hardware-aware, lightweight diffusion transformers optimized for mobile NPUs, significantly reducing parameters, FLOPs, and latency while maintaining high-quality image synthesis for on-device deployment.

Contribution

The paper presents a systematic hardware-aware optimization framework that creates efficient diffusion transformer models tailored for mobile NPUs, enabling high-quality on-device image generation.

Findings

01

Achieves 20-30% parameter reduction

02

Reduces FLOPs by 36-46%

03

Decreases on-device latency by 1.65 times

Abstract

Diffusion Transformers (DiT) have established a new state-of-the-art in high-fidelity image synthesis; however, their massive computational complexity and memory requirements hinder local deployment on resource-constrained edge devices. In this paper, we introduce EdgeDiT, a family of hardware-efficient generative transformers specifically engineered for mobile Neural Processing Units (NPUs), such as the Qualcomm Hexagon and Apple Neural Engine (ANE). By leveraging a hardware-aware optimization framework, we systematically identify and prune structural redundancies within the DiT backbone that are particularly taxing for mobile data-flows. Our approach yields a series of lightweight models that achieve a 20-30% reduction in parameters, a 36-46% decrease in FLOPs, and a 1.65-fold reduction in on-device latency without sacrificing the scaling advantages or the expressive capacity of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.