PLACID: Identity-Preserving Multi-Object Compositing via Video Diffusion with Synthetic Trajectories

Gemma Canet Tarr\'es; Manel Baradad; Francesc Moreno-Noguer; Yumeng Li

arXiv:2602.00267·cs.CV·February 3, 2026

PLACID: Identity-Preserving Multi-Object Compositing via Video Diffusion with Synthetic Trajectories

Gemma Canet Tarr\'es, Manel Baradad, Francesc Moreno-Noguer, Yumeng Li

PDF

Open Access

TL;DR

PLACID is a novel framework that uses video diffusion models and synthetic data to produce high-quality, identity-preserving multi-object composites with accurate layouts and visual fidelity.

Contribution

It introduces a new approach combining pretrained video diffusion models with synthetic data curation for improved multi-object compositing.

Findings

01

Outperforms state-of-the-art in identity and background preservation

02

Produces more accurate object layouts and sizes

03

Achieves higher user satisfaction in evaluations

Abstract

Recent advances in generative AI have dramatically improved photorealistic image synthesis, yet they fall short for studio-level multi-object compositing. This task demands simultaneous (i) near-perfect preservation of each item's identity, (ii) precise background and color fidelity, (iii) layout and design elements control, and (iv) complete, appealing displays showcasing all objects. However, current state-of-the-art models often alter object details, omit or duplicate objects, and produce layouts with incorrect relative sizing or inconsistent item presentations. To bridge this gap, we introduce PLACID, a framework that transforms a collection of object images into an appealing multi-object composite. Our approach makes two main contributions. First, we leverage a pretrained image-to-video (I2V) diffusion model with text control to preserve objects consistency, identities, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Multimodal Machine Learning Applications