Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis
Saar Stern, Ido Sobol, Or Litany

TL;DR
This paper introduces a task-aware evaluation framework for novel view synthesis that uses foundation model features to reliably assess the realism and faithfulness of generated images, addressing limitations of existing metrics.
Contribution
It proposes two new metrics, $D_{ ext{PRISM}}$ and $ ext{MMD}_{ ext{PRISM}}$, leveraging foundation model features for more accurate NVS evaluation, validated across multiple benchmarks.
Findings
Both metrics effectively identify incorrect generations.
They produce rankings consistent with human preferences.
The framework improves reliability of NVS model assessment.
Abstract
The goal of Novel View Synthesis (NVS) is to generate realistic images of a given content from unseen viewpoints. But how can we trust that a generated image truly reflects the intended transformation? Evaluating its reliability remains a major challenge. While recent generative models, particularly diffusion-based approaches, have significantly improved NVS quality, existing evaluation metrics struggle to assess whether a generated image is both realistic and faithful to the source view and intended viewpoint transformation. Standard metrics, such as pixel-wise similarity and distribution-based measures, often mis-rank incorrect results as they fail to capture the nuanced relationship between the source image, viewpoint change, and generated output. We propose a task-aware evaluation framework that leverages features from a strong NVS foundation model, Zero123, combined with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
