Signal: Selective Interaction and Global-local Alignment for Multi-Modal Object Re-Identification
Yangyang Liu, Yuhao Wang, Pingping Zhang

TL;DR
This paper introduces Signal, a multi-modal object Re-ID framework that employs selective interaction and global-local alignment modules to enhance feature discriminability and reduce background interference, validated on three benchmarks.
Contribution
The paper proposes a novel framework with selective interaction and alignment modules, improving multi-modal feature discrimination and consistency in object Re-ID tasks.
Findings
Outperforms existing methods on RGBNT201, RGBNT100, MSVR310 benchmarks.
Effective in reducing background interference and enhancing feature discriminability.
Demonstrates significant improvements in multi-modal object Re-ID accuracy.
Abstract
Multi-modal object Re-IDentification (ReID) is devoted to retrieving specific objects through the exploitation of complementary multi-modal image information. Existing methods mainly concentrate on the fusion of multi-modal features, yet neglecting the background interference. Besides, current multi-modal fusion methods often focus on aligning modality pairs but suffer from multi-modal consistency alignment. To address these issues, we propose a novel selective interaction and global-local alignment framework called Signal for multi-modal object ReID. Specifically, we first propose a Selective Interaction Module (SIM) to select important patch tokens with intra-modal and inter-modal information. These important patch tokens engage in the interaction with class tokens, thereby yielding more discriminative features. Then, we propose a Global Alignment Module (GAM) to simultaneously align…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Visual Attention and Saliency Detection
