ReCo: Region-Controlled Text-to-Image Generation
Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei, Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

TL;DR
ReCo introduces a novel region-controlled text-to-image generation method that allows precise spatial and attribute control of objects within generated images using an augmented input interface and fine-tuning of existing models.
Contribution
The paper presents ReCo, a new approach that enables flexible regional control in T2I models through position tokens and fine-tuning, improving accuracy and controllability over prior methods.
Findings
ReCo achieves higher image quality (FID: 8.82 to 7.36) on COCO.
Objects are more accurately placed with 20.40% better region classification accuracy.
ReCo better controls object count, spatial relationships, and attributes with free-form descriptions.
Abstract
Recently, large-scale text-to-image (T2I) models have shown impressive performance in generating high-fidelity images, but with limited controllability, e.g., precisely specifying the content in a specific region with a free-form text description. In this paper, we propose an effective technique for such regional control in T2I generation. We augment T2I models' inputs with an extra set of position tokens, which represent the quantized spatial coordinates. Each region is specified by four position tokens to represent the top-left and bottom-right corners, followed by an open-ended natural language regional description. Then, we fine-tune a pre-trained T2I model with such new input interface. Our model, dubbed as ReCo (Region-Controlled T2I), enables the region control for arbitrary objects described by open-ended regional texts rather than by object labels from a constrained category…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Generative Adversarial Networks and Image Synthesis
