HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections
Chen Dudai, Morris Alper, Hana Bezalel, Rana Hanocka, Itai Lang, Hadar, Averbuch-Elor

TL;DR
HaLo-NeRF introduces a novel system that combines vision-and-language models with Internet data to localize and understand semantic regions in large-scale landmark scenes, enhancing 3D scene representations.
Contribution
It presents a new localization approach that integrates large-scale Internet data and vision-language models to improve semantic understanding in 3D landmark scenes.
Findings
Accurately localizes semantic concepts in landmark scenes
Outperforms existing 3D models and 2D segmentation baselines
Leverages Internet data for fine-grained semantic knowledge
Abstract
Internet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large-scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine-grained understanding. In constrained 3D domains, recent methods have leveraged vision-and-language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain. In this work, we present a localization system that connects neural representations of scenes depicting large-scale landmarks with text describing a semantic region within the scene, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Data Visualization and Analytics · Advanced Image and Video Retrieval Techniques
MethodsFocus
