ROICtrl: Boosting Instance Control for Visual Generation
Yuchao Gu, Yipin Zhou, Yunfan Ye, Yixin Nie, Licheng Yu, Pingchuan Ma,, Kevin Qinghong Lin, Mike Zheng Shou

TL;DR
ROICtrl enhances diffusion models with regional instance control using ROI-Unpool, enabling precise, efficient multi-instance visual generation with reduced computational costs.
Contribution
Introduces ROI-Unpool and ROICtrl, novel methods for explicit regional control in diffusion models, improving accuracy and efficiency in multi-instance visual generation.
Findings
Superior regional control performance demonstrated
Significant reduction in computational costs
Compatibility with existing diffusion model add-ons
Abstract
Natural language often struggles to accurately associate positional and attribute information with multiple instances, which limits current text-based visual generation models to simpler compositions featuring only a few dominant instances. To address this limitation, this work enhances diffusion models by introducing regional instance control, where each instance is governed by a bounding box paired with a free-form caption. Previous methods in this area typically rely on implicit position encoding or explicit attention masks to separate regions of interest (ROIs), resulting in either inaccurate coordinate injection or large computational overhead. Inspired by ROI-Align in object detection, we introduce a complementary operation called ROI-Unpool. Together, ROI-Align and ROI-Unpool enable explicit, efficient, and accurate ROI manipulation on high-resolution feature maps for visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Video Analysis and Summarization
MethodsSoftmax · Attention Is All You Need · Adapter · Diffusion
