Text2Loc: 3D Point Cloud Localization from Natural Language
Yan Xia, Letian Shi, Zifeng Ding, Jo\~ao F. Henriques, Daniel Cremers

TL;DR
Text2Loc is a neural network that interprets natural language descriptions to localize 3D point clouds, combining hierarchical transformers and contrastive learning for improved accuracy and efficiency.
Contribution
The paper introduces Text2Loc, a novel neural network with a coarse-to-fine pipeline and a matching-free fine localization method for 3D point cloud localization from natural language.
Findings
Improves localization accuracy by up to 2x over state-of-the-art.
Uses a hierarchical transformer with max-pooling for semantic relationship modeling.
Proposes a faster, more accurate matching-free fine localization method.
Abstract
We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition, followed by fine localization. In global place recognition, relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM), whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover, we propose a novel matching-free fine localization method to further refine the location predictions, which completely removes the need for complicated text-instance matching and is lighter, faster, and more accurate than previous methods. Extensive experiments show that Text2Loc improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis
MethodsHierarchical Information Threading
