InstanceGen: Image Generation with Instance-level Instructions

Etai Sella; Yanir Kleiman; Hadar Averbuch-Elor

arXiv:2505.05678·cs.CV·May 20, 2025

InstanceGen: Image Generation with Instance-level Instructions

Etai Sella, Yanir Kleiman, Hadar Averbuch-Elor

PDF

1 Repo

TL;DR

InstanceGen introduces a novel method combining image-based structural guidance with language model instructions to generate images that accurately reflect complex, instance-level prompts including object counts, attributes, and spatial relations.

Contribution

The paper presents a new approach that integrates fine-grained structural initialization with language instructions for improved image generation fidelity.

Findings

01

Enhanced adherence to complex prompts with multiple objects and attributes

02

Better spatial and instance-level control in generated images

03

Outperforms existing methods in capturing detailed prompt semantics

Abstract

Despite rapid advancements in the capabilities of generative models, pretrained text-to-image models still struggle in capturing the semantics conveyed by complex prompts that compound multiple objects and instance-level attributes. Consequently, we are witnessing growing interests in integrating additional structural constraints, typically in the form of coarse bounding boxes, to better guide the generation process in such challenging cases. In this work, we take the idea of structural guidance a step further by making the observation that contemporary image generation models can directly provide a plausible fine-grained structural initialization. We propose a technique that couples this image-based structural guidance with LLM-based instance-level instructions, yielding output images that adhere to all parts of the text prompt, including object counts, instance-level attributes, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tsunghan-wu/SLD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.