PLACID: Identity-Preserving Multi-Object Compositing via Video Diffusion with Synthetic Trajectories
Gemma Canet Tarr\'es, Manel Baradad, Francesc Moreno-Noguer, Yumeng Li

TL;DR
PLACID is a novel framework that uses video diffusion models and synthetic data to produce high-quality, identity-preserving multi-object composites with accurate layouts and visual fidelity.
Contribution
It introduces a new approach combining pretrained video diffusion models with synthetic data curation for improved multi-object compositing.
Findings
Outperforms state-of-the-art in identity and background preservation
Produces more accurate object layouts and sizes
Achieves higher user satisfaction in evaluations
Abstract
Recent advances in generative AI have dramatically improved photorealistic image synthesis, yet they fall short for studio-level multi-object compositing. This task demands simultaneous (i) near-perfect preservation of each item's identity, (ii) precise background and color fidelity, (iii) layout and design elements control, and (iv) complete, appealing displays showcasing all objects. However, current state-of-the-art models often alter object details, omit or duplicate objects, and produce layouts with incorrect relative sizing or inconsistent item presentations. To bridge this gap, we introduce PLACID, a framework that transforms a collection of object images into an appealing multi-object composite. Our approach makes two main contributions. First, we leverage a pretrained image-to-video (I2V) diffusion model with text control to preserve objects consistency, identities, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Multimodal Machine Learning Applications
