IP-Composer: Semantic Composition of Visual Concepts
Sara Dorfman, Dana Cohen-Bar, Rinon Gal, Daniel Cohen-Or

TL;DR
IP-Composer is a training-free method that combines multiple images and natural language to generate new images with precise control over complex visual concept compositions.
Contribution
It extends IP-Adapter to handle multiple visual inputs using composite CLIP embeddings, enabling more accurate and diverse image synthesis without additional training.
Findings
Enables compositional image generation with multiple references
Provides more precise control over complex visual concepts
Operates without training or specialized data
Abstract
Content creators often draw inspiration from multiple visual sources, combining distinct elements to craft new compositions. Modern computational approaches now aim to emulate this fundamental creative process. Although recent diffusion models excel at text-guided compositional synthesis, text as a medium often lacks precise control over visual details. Image-based composition approaches can capture more nuanced features, but existing methods are typically limited in the range of concepts they can capture, and require expensive training procedures or specialized data. We present IP-Composer, a novel training-free approach for compositional image generation that leverages multiple image references simultaneously, while using natural language to describe the concept to be extracted from each image. Our method builds on IP-Adapter, which synthesizes novel images conditioned on an input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Cognitive Computing and Networks
MethodsDiffusion · Contrastive Language-Image Pre-training
