MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms
Tianyi Shang, Zhenyu Li, Pengjie Xu, Jinwei Qiao

TL;DR
MambaPlace introduces a novel cross-modal place recognition framework that effectively fuses natural language descriptions with 3D point clouds using attention mechanisms, significantly improving localization accuracy.
Contribution
The paper proposes MambaPlace, a new end-to-end framework utilizing attention Mamba mechanisms for enhanced multimodal fusion in place recognition tasks.
Findings
Achieves higher localization accuracy on KITTI360Pose dataset.
Effectively captures complex intra- and inter-modal correlations.
Outperforms existing state-of-the-art methods.
Abstract
Vision Language Place Recognition (VLVPR) enhances robot localization performance by incorporating natural language descriptions from images. By utilizing language information, VLVPR directs robot place matching, overcoming the constraint of solely depending on vision. The essence of multimodal fusion lies in mining the complementary information between different modalities. However, general fusion methods rely on traditional neural architectures and are not well equipped to capture the dynamics of cross modal interactions, especially in the presence of complex intra modal and inter modal correlations. To this end, this paper proposes a novel coarse to fine and end to end connected cross modal place recognition framework, called MambaPlace. In the coarse localization stage, the text description and 3D point cloud are encoded by the pretrained T5 and instance encoder, respectively. They…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Geographic Information Systems Studies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Gated Linear Unit · Dense Connections · Byte Pair Encoding · Softmax · Linear Layer · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Dropout · Inverse Square Root Schedule
