Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry

Hoang Nguyen; Xiaohao Xu; Xiaonan Huang

arXiv:2512.15423·cs.CV·December 18, 2025

Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry

Hoang Nguyen, Xiaohao Xu, Xiaonan Huang

PDF

Open Access

TL;DR

This paper identifies and addresses the 3D Mirage failure in monocular depth models, introducing a benchmark, evaluation metrics, and a novel self-distillation method to improve structural and contextual robustness.

Contribution

It presents the first benchmark and metrics for 3D hallucination in monocular depth estimation, and proposes a grounded self-distillation approach to mitigate this issue.

Findings

01

Introduced 3D-Mirage benchmark with real-world illusions.

02

Proposed Laplacian-based evaluation metrics DCS and CCS.

03

Grounded Self-Distillation effectively reduces 3D hallucinations.

Abstract

Monocular depth foundation models achieve remarkable generalization by learning large-scale semantic priors, but this creates a critical vulnerability: they hallucinate illusory 3D structures from geometrically planar but perceptually ambiguous inputs. We term this failure the 3D Mirage. This paper introduces the first end-to-end framework to probe, quantify, and tame this unquantified safety risk. To probe, we present 3D-Mirage, the first benchmark of real-world illusions (e.g., street art) with precise planar-region annotations and context-restricted crops. To quantify, we propose a Laplacian-based evaluation framework with two metrics: the Deviation Composite Score (DCS) for spurious non-planarity and the Confusion Composite Score (CCS) for contextual instability. To tame this failure, we introduce Grounded Self-Distillation, a parameter-efficient strategy that surgically enforces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Adversarial Robustness in Machine Learning