Instance-free Text to Point Cloud Localization with Relative Position   Awareness

Lichao Wang; Zhihao Yuan; Jinke Ren; Shuguang Cui; Zhen Li

arXiv:2404.17845·cs.CV·April 30, 2024

Instance-free Text to Point Cloud Localization with Relative Position Awareness

Lichao Wang, Zhihao Yuan, Jinke Ren, Shuguang Cui, Zhen Li

PDF

Open Access

TL;DR

This paper introduces an instance-free, two-stage text-to-point-cloud localization model that leverages relative position awareness to improve spatial understanding without relying on ground-truth instances, achieving competitive results.

Contribution

The paper presents a novel two-stage localization approach that uses relative position-aware modules and instance query extractors, eliminating the need for ground-truth instances.

Findings

01

Achieves competitive performance on KITTI360Pose dataset.

02

Effectively models spatial relations among instances.

03

Does not require ground-truth instances as input.

Abstract

Text-to-point-cloud cross-modal localization is an emerging vision-language task critical for future robot-human collaboration. It seeks to localize a position from a city-scale point cloud scene based on a few natural language instructions. In this paper, we address two key limitations of existing approaches: 1) their reliance on ground-truth instances as input; and 2) their neglect of the relative positions among potential instances. Our proposed model follows a two-stage pipeline, including a coarse stage for text-cell retrieval and a fine stage for position estimation. In both stages, we introduce an instance query extractor, in which the cells are encoded by a 3D sparse convolution U-Net to generate the multi-scale point cloud features, and a set of queries iteratively attend to these features to represent instances. In the coarse stage, a row-column relative position-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Sparse Evolutionary Training · Max Pooling · Concatenated Skip Connection · U-Net · Convolution