Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry
Hoang Nguyen, Xiaohao Xu, Xiaonan Huang

TL;DR
This paper identifies and addresses the 3D Mirage failure in monocular depth models, introducing a benchmark, evaluation metrics, and a novel self-distillation method to improve structural and contextual robustness.
Contribution
It presents the first benchmark and metrics for 3D hallucination in monocular depth estimation, and proposes a grounded self-distillation approach to mitigate this issue.
Findings
Introduced 3D-Mirage benchmark with real-world illusions.
Proposed Laplacian-based evaluation metrics DCS and CCS.
Grounded Self-Distillation effectively reduces 3D hallucinations.
Abstract
Monocular depth foundation models achieve remarkable generalization by learning large-scale semantic priors, but this creates a critical vulnerability: they hallucinate illusory 3D structures from geometrically planar but perceptually ambiguous inputs. We term this failure the 3D Mirage. This paper introduces the first end-to-end framework to probe, quantify, and tame this unquantified safety risk. To probe, we present 3D-Mirage, the first benchmark of real-world illusions (e.g., street art) with precise planar-region annotations and context-restricted crops. To quantify, we propose a Laplacian-based evaluation framework with two metrics: the Deviation Composite Score (DCS) for spurious non-planarity and the Confusion Composite Score (CCS) for contextual instability. To tame this failure, we introduce Grounded Self-Distillation, a parameter-efficient strategy that surgically enforces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Adversarial Robustness in Machine Learning
