AXM-Net: Implicit Cross-Modal Feature Alignment for Person   Re-identification

Ammarah Farooq; Muhammad Awais; Josef Kittler; Syed Safwan Khalid

arXiv:2101.08238·cs.CV·July 22, 2022

AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification

Ammarah Farooq, Muhammad Awais, Josef Kittler, Syed Safwan Khalid

PDF

Open Access 1 Video

TL;DR

AXM-Net introduces a novel CNN architecture with an implicit semantic alignment mechanism for cross-modal person re-identification, significantly improving accuracy in person search and cross-viewpoint scenarios.

Contribution

The paper proposes AXM-Block and a unified framework for implicit cross-modal semantic alignment, enhancing visual-textual feature coherence for person Re-ID.

Findings

01

Achieves 64.44% Rank@1 on CUHK-PEDES, surpassing SOTA.

02

Outperforms competitors by over 10% in cross-viewpoint text-to-image Re-ID.

03

Effectively utilizes textual data as supervision for visual feature learning.

Abstract

Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align cross-modality representations induced by the semantic information present for a person and ignore background information. This work presents a novel convolutional neural network (CNN) based architecture designed to learn semantically aligned cross-modal visual and textual representations. The underlying building block, named AXM-Block, is a unified multi-layer network that dynamically exploits the multi-scale knowledge from both modalities and re-calibrates each modality according to shared semantics. To complement the convolutional design, contextual attention is applied in the text branch to manipulate long-term dependencies. Moreover, we propose a unique design to enhance visual part-based feature coherence and locality information. Our framework is novel in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-Identification· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gait Recognition and Analysis · Human Pose and Action Recognition