EliGen: Entity-Level Controlled Image Generation with Regional Attention
Hong Zhang, Zhongjie Duan, Xingjun Wang, Yingda Chen, Yu Zhang

TL;DR
EliGen introduces a novel entity-level control framework for diffusion-based image generation, utilizing regional attention and a new dataset to enable precise manipulation of individual entities within images.
Contribution
The paper presents regional attention for diffusion transformers, a new high-quality dataset with entity annotations, and an inpainting pipeline, advancing fine-grained control in image synthesis.
Findings
Outperforms existing methods in spatial precision and image quality
Enables multi-entity inpainting and flexible integration with other models
Provides publicly available code, dataset, and models
Abstract
Recent advancements in diffusion models have significantly advanced text-to-image generation, yet global text prompts alone remain insufficient for achieving fine-grained control over individual entities within an image. To address this limitation, we present EliGen, a novel framework for Entity-level controlled image Generation. Firstly, we put forward regional attention, a mechanism for diffusion transformers that requires no additional parameters, seamlessly integrating entity prompts and arbitrary-shaped spatial masks. By contributing a high-quality dataset with fine-grained spatial and semantic entity-level annotations, we train EliGen to achieve robust and accurate entity-level manipulation, surpassing existing methods in both spatial precision and image quality. Additionally, we propose an inpainting fusion pipeline, extending its capabilities to multi-entity image inpainting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
MethodsDiffusion · Inpainting
