NanoFLUX: Distillation-Driven Compression of Large Text-to-Image Generation Models for Mobile Devices

Ruchika Chavhan; Malcolm Chadwick; Alberto Gil Couto Pimentel Ramos; Luca Morreale; Mehdi Noroozi; Abhinav Mehrotra

arXiv:2602.06879·cs.CV·February 9, 2026

NanoFLUX: Distillation-Driven Compression of Large Text-to-Image Generation Models for Mobile Devices

Ruchika Chavhan, Malcolm Chadwick, Alberto Gil Couto Pimentel Ramos, Luca Morreale, Mehdi Noroozi, Abhinav Mehrotra

PDF

Open Access

TL;DR

NanoFLUX is a compact, 2.4B parameter text-to-image model distilled from a 17B model, enabling high-quality image generation on mobile devices with significant size reduction and latency improvements.

Contribution

The paper introduces a novel compression pipeline including transformer pruning, token downsampling, and text encoder distillation for efficient on-device text-to-image generation.

Findings

01

Generates 512x512 images in ~2.5 seconds on mobile devices

02

Reduces model size from 12B to 2B parameters

03

Maintains high visual quality despite compression

Abstract

While large-scale text-to-image diffusion models continue to improve in visual quality, their increasing scale has widened the gap between state-of-the-art models and on-device solutions. To address this gap, we introduce NanoFLUX, a 2.4B text-to-image flow-matching model distilled from 17B FLUX.1-Schnell using a progressive compression pipeline designed to preserve generation quality. Our contributions include: (1) A model compression strategy driven by pruning redundant components in the diffusion transformer, reducing its size from 12B to 2B; (2) A ResNet-based token downsampling mechanism that reduces latency by allowing intermediate blocks to operate on lower-resolution tokens while preserving high-resolution processing elsewhere; (3) A novel text encoder distillation approach that leverages visual signals from early layers of the denoiser during sampling. Empirically, NanoFLUX…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Computer Graphics and Visualization Techniques