Context Sensing Attention Network for Video-based Person Re-identification
Kan Wang, Changxing Ding, Jianxin Pang, Xiangmin Xu

TL;DR
This paper introduces CSA-Net, a novel approach for video-based person re-identification that enhances feature extraction and temporal aggregation through context-aware attention mechanisms, achieving state-of-the-art results.
Contribution
The paper proposes CSA-Net with CSCA and CFA modules, which jointly improve frame feature emphasis and adaptive temporal weighting based on global sequence context.
Findings
Achieves state-of-the-art performance on four datasets.
Effectively emphasizes informative frames and global context.
Improves robustness against video interferences.
Abstract
Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames. Recent approaches handle this problem using temporal aggregation strategies. In this work, we propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps. First, we introduce the Context Sensing Channel Attention (CSCA) module, which emphasizes responses from informative channels for each frame. These informative channels are identified with reference not only to each individual frame, but also to the content of the entire sequence. Therefore, CSCA explores both the individuality of each frame and the global context of the sequence. Second, we propose the Contrastive Feature Aggregation (CFA) module, which predicts frame weights for temporal aggregation. Here, the weight for each frame is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gait Recognition and Analysis · Human Pose and Action Recognition
