ControlNeXt: Powerful and Efficient Control for Image and Video Generation
Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang and, Jiaya Jia

TL;DR
ControlNeXt introduces a streamlined, resource-efficient approach for controllable image and video generation, significantly reducing training complexity and enabling seamless style modifications without extra training.
Contribution
It presents a simplified architecture with minimal additional costs, reduces learnable parameters by up to 90%, and introduces Cross Normalization for faster, stable training convergence.
Findings
Demonstrates robustness across various models and data types.
Achieves up to 90% reduction in learnable parameters.
Enables style alteration without additional training.
Abstract
Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and ReferenceNet, to integrate conditioning controls. However, current controllable generation methods often require substantial additional computational resources, especially for video generation, and face challenges in training or exhibit weak control. In this paper, we propose ControlNeXt: a powerful and efficient method for controllable image and video generation. We first design a more straightforward and efficient architecture, replacing heavy additional branches with minimal additional cost compared to the base model. Such a concise structure also allows our method to seamlessly integrate with other LoRA weights, enabling style alteration without the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Eugeoter/controlnext-sdxl-anime-cannymodel· 75 dl· ♡ 375 dl♡ 3
- 🤗Eugeoter/controlnext-sdxl-vidit-depthmodel· 43 dl· ♡ 143 dl♡ 1
- 🤗Eugeoter/controlnext-sd1.5-vidit-depthmodel· 9 dl· ♡ 19 dl♡ 1
- 🤗Eugeoter/controlnext-sd1.5-deepfashoin-maskmodel· 9 dl9 dl
- 🤗Eugeoter/controlnext-sd1.5-deepfashion-multiviewmodel· 13 dl13 dl
- 🤗Eugeoter/controlnext-sd1.5-deepfashion-captionmodel· 18 dl18 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Medical Image Segmentation Techniques
MethodsBalanced Selection
