Match-and-Fuse: Consistent Generation from Unstructured Image Sets
Kate Feingold, Omri Kaduri, Tali Dekel

TL;DR
Match-and-Fuse is a zero-shot, training-free method that generates consistent image sets sharing common content but differing in viewpoint or context, using a graph-based approach to ensure global coherence.
Contribution
It introduces a novel set-to-set generation framework that models images as a graph, enabling consistent, multi-view content creation without supervision or masks.
Findings
Achieves state-of-the-art consistency in generated image sets.
Produces high-quality, coherent images across diverse viewpoints.
Operates without training or manual annotations.
Abstract
We present Match-and-Fuse - a zero-shot, training-free method for consistent controlled generation of unstructured image sets - collections that share a common visual element, yet differ in viewpoint, time of capture, and surrounding content. Unlike existing methods that operate on individual images or densely sampled videos, our framework performs set-to-set generation: given a source set and user prompts, it produces a new set that preserves cross-image consistency of shared content. Our key idea is to model the task as a graph, where each node corresponds to an image and each edge triggers a joint generation of image pairs. This formulation consolidates all pairwise generations into a unified framework, enforcing local consistency while ensuring global coherence across the entire set. This is achieved by fusing internal features across image pairs, guided by dense input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
