TL;DR
PositionIC introduces a unified framework combining a novel dataset synthesis pipeline and a layout-aware diffusion model to achieve high-fidelity, spatially controllable multi-subject image customization with state-of-the-art results.
Contribution
It presents BMPDS, the first automatic position-annotated dataset synthesis pipeline, and a lightweight diffusion framework with a visibility-aware attention mechanism for precise multi-subject placement.
Findings
Achieves state-of-the-art spatial precision and identity consistency.
Effectively decouples spatial embeddings from semantic features.
Sets new benchmarks in multi-entity image customization.
Abstract
Recent subject-driven image customization excels in fidelity, yet fine-grained instance-level spatial control remains an elusive challenge, hindering real-world applications. This limitation stems from two factors: a scarcity of scalable, position-annotated datasets, and the entanglement of identity and layout by global attention mechanisms. To this end, we introduce PositionIC, a unified framework for high-fidelity, spatially controllable multi-subject customization. First, we present BMPDS, the first automatic data-synthesis pipeline for position-annotated multi-subject datasets, effectively providing crucial spatial supervision. Second, we design a lightweight, layout-aware diffusion framework that integrates a novel visibility-aware attention mechanism. This mechanism explicitly models spatial relationships via an NeRF-inspired volumetric weight regulation to effectively decouple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
