Grounding Text-to-Image Diffusion Models for Controlled High-Quality   Image Generation

Ahmad S\"uleyman; G\"oksel Biricik

arXiv:2501.09194·cs.CV·February 11, 2025

Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation

Ahmad S\"uleyman, G\"oksel Biricik

PDF

Open Access

TL;DR

ObjectDiffusion is a novel model that enhances text-to-image diffusion by integrating semantic and spatial grounding, enabling precise control over object placement and improving image quality and diversity.

Contribution

The paper introduces ObjectDiffusion, a new approach that conditions diffusion models on grounding information, combining ControlNet and GLIGEN techniques for improved controllable image synthesis.

Findings

01

Achieves state-of-the-art metrics on COCO2017 dataset

02

Demonstrates strong grounding and control in diverse contexts

03

Produces high-fidelity images with precise object placement

Abstract

Text-to-image (T2I) generative diffusion models have demonstrated outstanding performance in synthesizing diverse, high-quality visuals from text captions. Several layout-to-image models have been developed to control the generation process by utilizing a wide range of layouts, such as segmentation maps, edges, and human keypoints. In this work, we propose ObjectDiffusion, a model that conditions T2I diffusion models on semantic and spatial grounding information, enabling the precise rendering and placement of desired objects in specific locations defined by bounding boxes. To achieve this, we make substantial modifications to the network architecture introduced in ControlNet to integrate it with the grounding method proposed in GLIGEN. We fine-tune ObjectDiffusion on the COCO2017 training dataset and evaluate it on the COCO2017 validation dataset. Our model improves the precision and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion