Multitwine: Multi-Object Compositing with Text and Layout Control

Gemma Canet Tarr\'es; Zhe Lin; Zhifei Zhang; He Zhang; Andrew Gilbert,; John Collomosse; Soo Ye Kim

arXiv:2502.05165·cs.CV·February 10, 2025

Multitwine: Multi-Object Compositing with Text and Layout Control

Gemma Canet Tarr\'es, Zhe Lin, Zhifei Zhang, He Zhang, Andrew Gilbert,, John Collomosse, Soo Ye Kim

PDF

Open Access

TL;DR

Multitwine is a novel generative model that enables multi-object scene compositing guided by text and layout, supporting complex interactions and autonomous prop generation, with state-of-the-art performance.

Contribution

It introduces the first model for multi-object compositing with text and layout control, combining compositing and subject-driven generation in a unified framework.

Findings

01

Achieves state-of-the-art results in multi-object compositing.

02

Supports complex interactions and autonomous prop generation.

03

Uses a new data synthesis pipeline for training.

Abstract

We introduce the first generative model capable of simultaneous multi-object compositing, guided by both text and layout. Our model allows for the addition of multiple objects within a scene, capturing a range of interactions from simple positional relations (e.g., next to, in front of) to complex actions requiring reposing (e.g., hugging, playing guitar). When an interaction implies additional props, like `taking a selfie', our model autonomously generates these supporting objects. By jointly training for compositing and subject-driven generation, also known as customization, we achieve a more balanced integration of textual and visual inputs for text-driven object compositing. As a result, we obtain a versatile model with state-of-the-art performance in both tasks. We further present a data generation pipeline leveraging visual and language models to effortlessly synthesize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology