Video-based Person Re-identification via 3D Convolutional Networks and   Non-local Attention

Xingyu Liao; Lingxiao He; Zhouwang Yang; Chi Zhang

arXiv:1807.05073·cs.CV·April 30, 2019·5 cites

Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention

Xingyu Liao, Lingxiao He, Zhouwang Yang, Chi Zhang

PDF

Open Access

TL;DR

This paper introduces a novel video-based person re-identification method using 3D convolutional networks combined with non-local attention to effectively capture spatial-temporal dependencies and address misalignment issues, outperforming existing methods.

Contribution

The paper proposes a new framework that integrates 3D convolutions and non-local blocks for improved feature aggregation in video-based person ReID, handling temporal dependency and spatial misalignment.

Findings

01

Outperforms state-of-the-art methods on three datasets.

02

Effectively captures spatial-temporal dependencies.

03

Addresses spatial misalignment with non-local attention.

Abstract

Video-based person re-identification (ReID) is a challenging problem, where some video tracks of people across non-overlapping cameras are available for matching. Feature aggregation from a video track is a key step for video-based person ReID. Many existing methods tackle this problem by average/maximum temporal pooling or RNNs with attention. However, these methods cannot deal with temporal dependency and spatial misalignment problems at the same time. We are inspired by video action recognition that involves the identification of different actions from video tracks. Firstly, we use 3D convolutions on video volume, instead of using 2D convolutions across frames, to extract spatial and temporal features simultaneously. Secondly, we use a non-local block to tackle the misalignment problem and capture spatial-temporal long-range dependencies. As a result, the network can learn useful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gait Recognition and Analysis

MethodsResidual Connection · Non-Local Operation · 1x1 Convolution · Non-Local Block