SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Dongting Hu; Aarush Gupta; Magzhan Gabidolla; Arpit Sahni; Huseyin Coskun; Yanyu Li; Yerlan Idelbayev; Ahsan Mahmood; Aleksei Lebedev; Dishani Lahiri; Anujraaj Goyal; Ju Hu; Mingming Gong; Sergey Tulyakov; Anil Kag

arXiv:2601.08303·cs.CV·February 12, 2026

SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Dongting Hu, Aarush Gupta, Magzhan Gabidolla, Arpit Sahni, Huseyin Coskun, Yanyu Li, Yerlan Idelbayev, Ahsan Mahmood, Aleksei Lebedev, Dishani Lahiri, Anujraaj Goyal, Ju Hu, Mingming Gong, Sergey Tulyakov, Anil Kag

PDF

Open Access

TL;DR

This paper introduces SnapGen++, a diffusion transformer framework optimized for edge devices that maintains high image quality while significantly reducing computational and memory requirements through novel architecture, training, and distillation techniques.

Contribution

The paper presents a compact DiT architecture with sparse attention, an elastic training framework for multi-capacity models, and a knowledge-guided distillation method for efficient high-fidelity image generation.

Findings

01

Achieves transformer-level quality with minimal resource usage.

02

Enables real-time image generation on mobile devices.

03

Outperforms existing models in efficiency and quality trade-offs.

Abstract

Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment due to their high computational and memory costs. In this work, we present an efficient DiT framework tailored for mobile and edge devices that achieves transformer-level generation quality under strict resource constraints. Our design combines three key components. First, we propose a compact DiT architecture with an adaptive global-local sparse attention mechanism that balances global context modeling and local detail preservation. Second, we propose an elastic training framework that jointly optimizes sub-DiTs of varying capacities within a unified supernetwork, allowing a single model to dynamically adjust for efficient inference across different hardware. Finally, we develop Knowledge-Guided Distribution Matching Distillation, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Neuroimaging Techniques and Applications