Designing Parameter and Compute Efficient Diffusion Transformers using   Distillation

Vignesh Sundaresha

arXiv:2502.14226·cs.CV·February 21, 2025

Designing Parameter and Compute Efficient Diffusion Transformers using Distillation

Vignesh Sundaresha

PDF

Open Access

TL;DR

This paper explores how to design and distill diffusion transformers to be efficient in size and computation, enabling deployment on resource-limited edge devices for real-time image and video generation.

Contribution

The paper introduces principled guidelines for designing compact diffusion transformers and proposes two novel distillation methods tailored for edge deployment.

Findings

01

Achieved significant reduction in model size and computation without sacrificing performance.

02

Demonstrated effectiveness of proposed methods on NVIDIA Jetson Orin Nano.

03

Provided insights into the trade-offs between model size, speed, and accuracy.

Abstract

Diffusion Transformers (DiTs) with billions of model parameters form the backbone of popular image and video generation models like DALL.E, Stable-Diffusion and SORA. Though these models are necessary in many low-latency applications like Augmented/Virtual Reality, they cannot be deployed on resource-constrained Edge devices (like Apple Vision Pro or Meta Ray-Ban glasses) due to their huge computational complexity. To overcome this, we turn to knowledge distillation and perform a thorough design-space exploration to achieve the best DiT for a given parameter size. In particular, we provide principles for how to choose design knobs such as depth, width, attention heads and distillation setup for a DiT. During the process, a three-way trade-off emerges between model performance, size and speed that is crucial for Edge implementation of diffusion. We also propose two distillation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Membrane Separation Technologies · Quantum-Dot Cellular Automata

MethodsSoftmax · Attention Is All You Need · Knowledge Distillation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings