Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar,, Siddharth Srivastava, Chetan Arora, K Madhava Krishna

TL;DR
This paper introduces a novel multimodal transformer architecture for robust object-instance re-identification using RGB and depth data, significantly improving localization and perception in cluttered and varying illumination scenes.
Contribution
It proposes a dual-path transformer model for multimodal ReID and a ReID-based localization framework, validated on custom and public RGB-D datasets, advancing robotic perception capabilities.
Findings
ReID accuracy achieved mAP of 75.18
Localization success rate of 83% on TUM-RGBD
Depth data enhances ReID robustness in challenging scenes
Abstract
Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information. By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints. We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Image and Object Detection Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Convolution · Thinned U-shape Module
