NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer
Shanyuan Liu, Jian Zhu, Junda Lu, Yue Gong, Liuzhuozheng Li, Bo Cheng, Yuhang Ma, Liebucha Wu, Xiaoyu Wu, Dawei Leng, Yuhui Yin

TL;DR
NanoControl introduces a lightweight, efficient control framework for diffusion transformers, achieving state-of-the-art controllable text-to-image synthesis with minimal parameter overhead and computational costs.
Contribution
It proposes NanoControl, a novel control method using Flux backbone and low-rank adaptation, reducing overhead while enhancing controllability and quality.
Findings
Achieves state-of-the-art controllable text-to-image results.
Reduces parameter count and GFLOPs by over 99%.
Maintains superior generation quality and controllability.
Abstract
Diffusion Transformers (DiTs) have demonstrated exceptional capabilities in text-to-image synthesis. However, in the domain of controllable text-to-image generation using DiTs, most existing methods still rely on the ControlNet paradigm originally designed for UNet-based diffusion models. This paradigm introduces significant parameter overhead and increased computational costs. To address these challenges, we propose the Nano Control Diffusion Transformer (NanoControl), which employs Flux as the backbone network. Our model achieves state-of-the-art controllable text-to-image generation performance while incurring only a 0.024\% increase in parameter count and a 0.029\% increase in GFLOPs, thus enabling highly efficient controllable generation. Specifically, rather than duplicating the DiT backbone for control, we design a LoRA-style (low-rank adaptation) control module that directly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
