Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering
Ido Sobol, Chenfeng Xu, Or Litany

TL;DR
Zero-to-Hero introduces a test-time method that improves zero-shot view synthesis by filtering attention maps during denoising, leading to more consistent and realistic novel views without retraining.
Contribution
It proposes a novel attention map filtering technique during denoising in diffusion models, enhancing view synthesis quality without additional training or computational costs.
Findings
Significant improvements in image fidelity and geometric consistency.
Effective across out-of-distribution objects and various conditioning scenarios.
Enhanced reliability of generated novel views.
Abstract
Generating realistic images from arbitrary views based on a single source image remains a significant challenge in computer vision, with broad applications ranging from e-commerce to immersive virtual experiences. Recent advancements in diffusion models, particularly the Zero-1-to-3 model, have been widely adopted for generating plausible views, videos, and 3D models. However, these models still struggle with inconsistencies and implausibility in new views generation, especially for challenging changes in viewpoint. In this work, we propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps during the denoising process of Zero-1-to-3. By drawing an analogy between the denoising process and stochastic gradient descent (SGD), we implement a filtering mechanism that aggregates attention maps, enhancing generation reliability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
MethodsSparse Evolutionary Training · Diffusion
