Doppelgangers++: Improved Visual Disambiguation with Geometric 3D   Features

Yuanbo Xiangli; Ruojin Cai; Hanyu Chen; Jeffrey Byrne; Noah Snavely

arXiv:2412.05826·cs.CV·April 8, 2025

Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features

Yuanbo Xiangli, Ruojin Cai, Hanyu Chen, Jeffrey Byrne, Noah Snavely

PDF

Open Access

TL;DR

Doppelgangers++ introduces a robust, scene-agnostic method for detecting visually similar but distinct surfaces to improve 3D reconstruction accuracy, leveraging a diversified dataset and a Transformer-based classifier with 3D-aware features.

Contribution

The paper presents a novel Transformer-based classifier using 3D-aware features and a diversified dataset to enhance visual disambiguation in 3D reconstruction tasks.

Findings

01

Significantly improves pairwise visual disambiguation accuracy.

02

Enhances 3D reconstruction quality across diverse real-world scenes.

03

Seamlessly integrates into existing SfM pipelines.

Abstract

Accurate 3D reconstruction is frequently hindered by visual aliasing, where visually similar but distinct surfaces (aka, doppelgangers), are incorrectly matched. These spurious matches distort the structure-from-motion (SfM) process, leading to misplaced model elements and reduced accuracy. Prior efforts addressed this with CNN classifiers trained on curated datasets, but these approaches struggle to generalize across diverse real-world scenes and can require extensive parameter tuning. In this work, we present Doppelgangers++, a method to enhance doppelganger detection and improve 3D reconstruction accuracy. Our contributions include a diversified training dataset that incorporates geo-tagged images from everyday scenes to expand robustness beyond landmark-based datasets. We further propose a Transformer-based classifier that leverages 3D-aware features from the MASt3R model, achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques