Text to Point Cloud Localization with Relation-Enhanced Transformer

Guangzhi Wang; Hehe Fan; Mohan Kankanhalli

arXiv:2301.05372·cs.CV·January 16, 2023

Text to Point Cloud Localization with Relation-Enhanced Transformer

Guangzhi Wang, Hehe Fan, Mohan Kankanhalli

PDF

Open Access 1 Video

TL;DR

This paper introduces a Relation-Enhanced Transformer that improves text-to-point-cloud localization by explicitly modeling relations among hints, achieving state-of-the-art results on a large dataset.

Contribution

The paper proposes a novel Relation-Enhanced Transformer with a relation-aware self-attention mechanism for improved cross-modal localization.

Findings

01

Outperforms previous methods on KITTI360Pose dataset

02

Explicit relation encoding enhances discriminability in localization

03

Effective in city-scale point cloud scenarios

Abstract

Automatically localizing a position based on a few natural language instructions is essential for future robots to communicate and collaborate with humans. To approach this goal, we focus on the text-to-point-cloud cross-modal localization problem. Given a textual query, it aims to identify the described location from city-scale point clouds. The task involves two challenges. 1) In city-scale point clouds, similar ambient instances may exist in several locations. Searching each location in a huge point cloud with only instances as guidance may lead to less discriminative signals and incorrect results. 2) In textual descriptions, the hints are provided separately. In this case, the relations among those hints are not explicitly described, leading to difficulties of learning relations. To overcome these two challenges, we propose a unified Relation-Enhanced Transformer (RET) to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Text to Point Cloud Localization with Relation-Enhanced Transformer· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Linear Layer · Dropout · Softmax · Adam · Multi-Head Attention · Residual Connection · Label Smoothing