Multi-scale 3D Convolution Network for Video Based Person Re-Identification
Jianing Li, Shiliang Zhang, Tiejun Huang

TL;DR
This paper introduces a multi-scale 3D convolutional network with residual attention for efficient and effective video-based person re-identification, outperforming existing methods on standard benchmarks.
Contribution
It proposes a novel M3D convolutional architecture with residual attention layers for improved spatial-temporal feature learning in person ReID.
Findings
Outperforms existing 3D CNNs and state-of-the-art methods on MARS, PRID2011, and iLIDS-VID datasets.
Introduces a compact, efficient, and easily optimized multi-scale 3D convolutional network.
Effectively combines spatial and temporal features for robust person re-identification.
Abstract
This paper proposes a two-stream convolution network to extract spatial and temporal cues for video based person Re-Identification (ReID). A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network. The resulting M3D convolution network introduces a fraction of parameters into the 2D CNN, but gains the ability of multi-scale temporal feature learning. With this compact architecture, M3D convolution network is also more efficient and easier to optimize than existing 3D convolution networks. The temporal stream further involves Residual Attention Layers (RAL) to refine the temporal features. By jointly learning spatial-temporal attention masks in a residual manner, RAL identifies the discriminative spatial regions and temporal cues. The other stream in our network is implemented with a 2D CNN for spatial feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gait Recognition and Analysis
Methods3D Convolution · Convolution
