Match-and-Fuse: Consistent Generation from Unstructured Image Sets

Kate Feingold; Omri Kaduri; Tali Dekel

arXiv:2511.22287·cs.CV·March 17, 2026

Match-and-Fuse: Consistent Generation from Unstructured Image Sets

Kate Feingold, Omri Kaduri, Tali Dekel

PDF

Open Access

TL;DR

Match-and-Fuse is a zero-shot, training-free method that generates consistent image sets sharing common content but differing in viewpoint or context, using a graph-based approach to ensure global coherence.

Contribution

It introduces a novel set-to-set generation framework that models images as a graph, enabling consistent, multi-view content creation without supervision or masks.

Findings

01

Achieves state-of-the-art consistency in generated image sets.

02

Produces high-quality, coherent images across diverse viewpoints.

03

Operates without training or manual annotations.

Abstract

We present Match-and-Fuse - a zero-shot, training-free method for consistent controlled generation of unstructured image sets - collections that share a common visual element, yet differ in viewpoint, time of capture, and surrounding content. Unlike existing methods that operate on individual images or densely sampled videos, our framework performs set-to-set generation: given a source set and user prompts, it produces a new set that preserves cross-image consistency of shared content. Our key idea is to model the task as a graph, where each node corresponds to an image and each edge triggers a joint generation of image pairs. This formulation consolidates all pairwise generations into a unified framework, enforcing local consistency while ensuring global coherence across the entire set. This is achieved by fusing internal features across image pairs, guided by dense input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques