DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control
Shiyan Du, Conghan Yue, Xinyu Cheng, Dongyu Zhang

TL;DR
DEIG is a novel framework for fine-grained, controllable multi-instance image generation that leverages instance-aware representations and attention mechanisms to improve spatial and semantic accuracy.
Contribution
DEIG introduces an instance detail extractor and a detail fusion module for enhanced semantic control and scene coherence in multi-instance generation.
Findings
Outperforms existing methods in spatial and semantic accuracy
Demonstrates strong compositional generalization
Easily integrates into diffusion-based pipelines
Abstract
Multi-Instance Generation has advanced significantly in spatial placement and attribute binding. However, existing approaches still face challenges in fine-grained semantic understanding, particularly when dealing with complex textual descriptions. To overcome these limitations, we propose DEIG, a novel framework for fine-grained and controllable multi-instance generation. DEIG integrates an Instance Detail Extractor (IDE) that transforms text encoder embeddings into compact, instance-aware representations, and a Detail Fusion Module (DFM) that applies instance-based masked attention to prevent attribute leakage across instances. These components enable DEIG to generate visually coherent multi-instance scenes that precisely match rich, localized textual descriptions. To support fine-grained supervision, we construct a high-quality dataset with detailed, compositional instance captions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Handwritten Text Recognition Techniques
