Designing Parameter and Compute Efficient Diffusion Transformers using Distillation
Vignesh Sundaresha

TL;DR
This paper explores how to design and distill diffusion transformers to be efficient in size and computation, enabling deployment on resource-limited edge devices for real-time image and video generation.
Contribution
The paper introduces principled guidelines for designing compact diffusion transformers and proposes two novel distillation methods tailored for edge deployment.
Findings
Achieved significant reduction in model size and computation without sacrificing performance.
Demonstrated effectiveness of proposed methods on NVIDIA Jetson Orin Nano.
Provided insights into the trade-offs between model size, speed, and accuracy.
Abstract
Diffusion Transformers (DiTs) with billions of model parameters form the backbone of popular image and video generation models like DALL.E, Stable-Diffusion and SORA. Though these models are necessary in many low-latency applications like Augmented/Virtual Reality, they cannot be deployed on resource-constrained Edge devices (like Apple Vision Pro or Meta Ray-Ban glasses) due to their huge computational complexity. To overcome this, we turn to knowledge distillation and perform a thorough design-space exploration to achieve the best DiT for a given parameter size. In particular, we provide principles for how to choose design knobs such as depth, width, attention heads and distillation setup for a DiT. During the process, a three-way trade-off emerges between model performance, size and speed that is crucial for Edge implementation of diffusion. We also propose two distillation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Membrane Separation Technologies · Quantum-Dot Cellular Automata
MethodsSoftmax · Attention Is All You Need · Knowledge Distillation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
