HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring   Unconstrained Photo Collections

Chen Dudai; Morris Alper; Hana Bezalel; Rana Hanocka; Itai Lang; Hadar; Averbuch-Elor

arXiv:2404.16845·cs.CV·August 6, 2024

HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections

Chen Dudai, Morris Alper, Hana Bezalel, Rana Hanocka, Itai Lang, Hadar, Averbuch-Elor

PDF

Open Access

TL;DR

HaLo-NeRF introduces a novel system that combines vision-and-language models with Internet data to localize and understand semantic regions in large-scale landmark scenes, enhancing 3D scene representations.

Contribution

It presents a new localization approach that integrates large-scale Internet data and vision-language models to improve semantic understanding in 3D landmark scenes.

Findings

01

Accurately localizes semantic concepts in landmark scenes

02

Outperforms existing 3D models and 2D segmentation baselines

03

Leverages Internet data for fine-grained semantic knowledge

Abstract

Internet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large-scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine-grained understanding. In constrained 3D domains, recent methods have leveraged vision-and-language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain. In this work, we present a localization system that connects neural representations of scenes depicting large-scale landmarks with text describing a semantic region within the scene, by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Data Visualization and Analytics · Advanced Image and Video Retrieval Techniques

MethodsFocus