Instance-free Text to Point Cloud Localization with Relative Position Awareness
Lichao Wang, Zhihao Yuan, Jinke Ren, Shuguang Cui, Zhen Li

TL;DR
This paper introduces an instance-free, two-stage text-to-point-cloud localization model that leverages relative position awareness to improve spatial understanding without relying on ground-truth instances, achieving competitive results.
Contribution
The paper presents a novel two-stage localization approach that uses relative position-aware modules and instance query extractors, eliminating the need for ground-truth instances.
Findings
Achieves competitive performance on KITTI360Pose dataset.
Effectively models spatial relations among instances.
Does not require ground-truth instances as input.
Abstract
Text-to-point-cloud cross-modal localization is an emerging vision-language task critical for future robot-human collaboration. It seeks to localize a position from a city-scale point cloud scene based on a few natural language instructions. In this paper, we address two key limitations of existing approaches: 1) their reliance on ground-truth instances as input; and 2) their neglect of the relative positions among potential instances. Our proposed model follows a two-stage pipeline, including a coarse stage for text-cell retrieval and a fine stage for position estimation. In both stages, we introduce an instance query extractor, in which the cells are encoded by a 3D sparse convolution U-Net to generate the multi-scale point cloud features, and a set of queries iteratively attend to these features to represent instances. In the coarse stage, a row-column relative position-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Sparse Evolutionary Training · Max Pooling · Concatenated Skip Connection · U-Net · Convolution
