Multi-scale 3D Convolution Network for Video Based Person   Re-Identification

Jianing Li; Shiliang Zhang; Tiejun Huang

arXiv:1811.07468·cs.CV·November 20, 2018·29 cites

Multi-scale 3D Convolution Network for Video Based Person Re-Identification

Jianing Li, Shiliang Zhang, Tiejun Huang

PDF

Open Access

TL;DR

This paper introduces a multi-scale 3D convolutional network with residual attention for efficient and effective video-based person re-identification, outperforming existing methods on standard benchmarks.

Contribution

It proposes a novel M3D convolutional architecture with residual attention layers for improved spatial-temporal feature learning in person ReID.

Findings

01

Outperforms existing 3D CNNs and state-of-the-art methods on MARS, PRID2011, and iLIDS-VID datasets.

02

Introduces a compact, efficient, and easily optimized multi-scale 3D convolutional network.

03

Effectively combines spatial and temporal features for robust person re-identification.

Abstract

This paper proposes a two-stream convolution network to extract spatial and temporal cues for video based person Re-Identification (ReID). A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network. The resulting M3D convolution network introduces a fraction of parameters into the 2D CNN, but gains the ability of multi-scale temporal feature learning. With this compact architecture, M3D convolution network is also more efficient and easier to optimize than existing 3D convolution networks. The temporal stream further involves Residual Attention Layers (RAL) to refine the temporal features. By jointly learning spatial-temporal attention masks in a residual manner, RAL identifies the discriminative spatial regions and temporal cues. The other stream in our network is implemented with a 2D CNN for spatial feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gait Recognition and Analysis

Methods3D Convolution · Convolution