POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion

Andrea Rigo; Luca Stornaiuolo; Weijie Wang; Mauro Martino; Bruno Lepri; Nicu Sebe

arXiv:2601.14056·cs.CV·January 21, 2026

POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion

Andrea Rigo, Luca Stornaiuolo, Weijie Wang, Mauro Martino, Bruno Lepri, Nicu Sebe

PDF

Open Access

TL;DR

POCI-Diff introduces a diffusion-based framework for text-to-image generation that ensures consistent, interactive 3D layout control and editing, effectively maintaining object geometry and identity across complex multi-object scenes.

Contribution

It presents a novel unified diffusion approach that enforces 3D geometric constraints and semantic binding for improved layout adherence and object consistency in scene synthesis.

Findings

01

Outperforms state-of-the-art in visual fidelity and layout adherence.

02

Supports object insertion, removal, and transformation via regeneration.

03

Maintains object identity and scene coherence across edits.

Abstract

We propose a diffusion-based approach for Text-to-Image (T2I) generation with consistent and interactive 3D layout control and editing. While prior methods improve spatial adherence using 2D cues or iterative copy-warp-paste strategies, they often distort object geometry and fail to preserve consistency across edits. To address these limitations, we introduce a framework for Positioning Objects Consistently and Interactively (POCI-Diff), a novel formulation for jointly enforcing 3D geometric constraints and instance-level semantic binding within a unified diffusion process. Our method enables explicit per-object semantic control by binding individual text descriptions to specific 3D bounding boxes through Blended Latent Diffusion, allowing one-shot synthesis of complex multi-object scenes. We further propose a warping-free generative editing pipeline that supports object insertion,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques