Can These Views Be One Scene? Evaluating Multiview 3D Consistency when 3D Foundation Models Hallucinate

Soumava Paul; Prakhar Kaushik; Alan Yuille

arXiv:2605.18754·cs.CV·May 19, 2026

Can These Views Be One Scene? Evaluating Multiview 3D Consistency when 3D Foundation Models Hallucinate

Soumava Paul, Prakhar Kaushik, Alan Yuille

PDF

TL;DR

This paper evaluates the reliability of multiview 3D consistency metrics, introduces a benchmark and a robustness analysis, revealing that many metrics can hallucinate geometry and noise, and proposes improved failure-aware metrics.

Contribution

It introduces enchmark, a robustness benchmark for multiview 3D consistency, and a parametric family of neural metrics that are more robust and failure-aware.

Findings

01

Existing metrics can hallucinate dense geometry and support unrelated scenes.

02

The proposed COLMAP-based metrics correlate better with human judgments.

03

The new metrics are up to 4 times more aligned with human perception.

Abstract

Multiview 3D evaluation assumes that the images being scored are observations of one static 3D scene. This assumption can fail in NVS and sparse-view reconstruction: inputs or generated outputs may contain artifacts, outlier frames, repeated views, or noise, yet still receive high 3D consistency scores. Existing reference-based metrics require ground truth, while ground-truth-free metrics such as MEt3R depend on learned reconstruction backbones whose failure modes are poorly characterized. We study this reliability problem by comparing neural reconstruction priors with classical geometric verification. We introduce \benchmark, a controlled robustness benchmark for multiview 3D consistency, and a parametric family that decomposes neural metrics into backbone, residual, and aggregation components. This family recovers MEt3R and yields variants up to $3 \times$ more robust. Our analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.