NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer

Shanyuan Liu; Jian Zhu; Junda Lu; Yue Gong; Liuzhuozheng Li; Bo Cheng; Yuhang Ma; Liebucha Wu; Xiaoyu Wu; Dawei Leng; Yuhui Yin

arXiv:2508.10424·cs.CV·August 15, 2025

NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer

Shanyuan Liu, Jian Zhu, Junda Lu, Yue Gong, Liuzhuozheng Li, Bo Cheng, Yuhang Ma, Liebucha Wu, Xiaoyu Wu, Dawei Leng, Yuhui Yin

PDF

TL;DR

NanoControl introduces a lightweight, efficient control framework for diffusion transformers, achieving state-of-the-art controllable text-to-image synthesis with minimal parameter overhead and computational costs.

Contribution

It proposes NanoControl, a novel control method using Flux backbone and low-rank adaptation, reducing overhead while enhancing controllability and quality.

Findings

01

Achieves state-of-the-art controllable text-to-image results.

02

Reduces parameter count and GFLOPs by over 99%.

03

Maintains superior generation quality and controllability.

Abstract

Diffusion Transformers (DiTs) have demonstrated exceptional capabilities in text-to-image synthesis. However, in the domain of controllable text-to-image generation using DiTs, most existing methods still rely on the ControlNet paradigm originally designed for UNet-based diffusion models. This paradigm introduces significant parameter overhead and increased computational costs. To address these challenges, we propose the Nano Control Diffusion Transformer (NanoControl), which employs Flux as the backbone network. Our model achieves state-of-the-art controllable text-to-image generation performance while incurring only a 0.024\% increase in parameter count and a 0.029\% increase in GFLOPs, thus enabling highly efficient controllable generation. Specifically, rather than duplicating the DiT backbone for control, we design a LoRA-style (low-rank adaptation) control module that directly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.