Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention
Xingyu Liao, Lingxiao He, Zhouwang Yang, Chi Zhang

TL;DR
This paper introduces a novel video-based person re-identification method using 3D convolutional networks combined with non-local attention to effectively capture spatial-temporal dependencies and address misalignment issues, outperforming existing methods.
Contribution
The paper proposes a new framework that integrates 3D convolutions and non-local blocks for improved feature aggregation in video-based person ReID, handling temporal dependency and spatial misalignment.
Findings
Outperforms state-of-the-art methods on three datasets.
Effectively captures spatial-temporal dependencies.
Addresses spatial misalignment with non-local attention.
Abstract
Video-based person re-identification (ReID) is a challenging problem, where some video tracks of people across non-overlapping cameras are available for matching. Feature aggregation from a video track is a key step for video-based person ReID. Many existing methods tackle this problem by average/maximum temporal pooling or RNNs with attention. However, these methods cannot deal with temporal dependency and spatial misalignment problems at the same time. We are inspired by video action recognition that involves the identification of different actions from video tracks. Firstly, we use 3D convolutions on video volume, instead of using 2D convolutions across frames, to extract spatial and temporal features simultaneously. Secondly, we use a non-local block to tackle the misalignment problem and capture spatial-temporal long-range dependencies. As a result, the network can learn useful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gait Recognition and Analysis
MethodsResidual Connection · Non-Local Operation · 1x1 Convolution · Non-Local Block
