DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control

Shiyan Du; Conghan Yue; Xinyu Cheng; Dongyu Zhang

arXiv:2602.18282·cs.CV·February 23, 2026

DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control

Shiyan Du, Conghan Yue, Xinyu Cheng, Dongyu Zhang

PDF

Open Access 1 Video

TL;DR

DEIG is a novel framework for fine-grained, controllable multi-instance image generation that leverages instance-aware representations and attention mechanisms to improve spatial and semantic accuracy.

Contribution

DEIG introduces an instance detail extractor and a detail fusion module for enhanced semantic control and scene coherence in multi-instance generation.

Findings

01

Outperforms existing methods in spatial and semantic accuracy

02

Demonstrates strong compositional generalization

03

Easily integrates into diffusion-based pipelines

Abstract

Multi-Instance Generation has advanced significantly in spatial placement and attribute binding. However, existing approaches still face challenges in fine-grained semantic understanding, particularly when dealing with complex textual descriptions. To overcome these limitations, we propose DEIG, a novel framework for fine-grained and controllable multi-instance generation. DEIG integrates an Instance Detail Extractor (IDE) that transforms text encoder embeddings into compact, instance-aware representations, and a Detail Fusion Module (DFM) that applies instance-based masked attention to prevent attribute leakage across instances. These components enable DEIG to generate visually coherent multi-instance scenes that precisely match rich, localized textual descriptions. To support fine-grained supervision, we construct a high-quality dataset with detailed, compositional instance captions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Handwritten Text Recognition Techniques