Revisiting Temporal Modeling for Video-based Person ReID
Jiyang Gao, Ram Nevatia

TL;DR
This paper systematically compares four temporal modeling techniques for video-based person re-identification and introduces a new attention generation network that improves performance on the MARS dataset.
Contribution
It provides a comprehensive comparison of existing temporal modeling methods and proposes a novel attention generation network for better temporal feature extraction.
Findings
The proposed method outperforms state-of-the-art techniques on MARS dataset.
Temporal convolution-based attention improves temporal feature aggregation.
Systematic analysis clarifies the impact of different temporal modeling choices.
Abstract
Video-based person reID is an important task, which has received much attention in recent years due to the increasing demand in surveillance and camera networks. A typical video-based person reID system consists of three parts: an image-level feature extractor (e.g. CNN), a temporal modeling method to aggregate temporal features and a loss function. Although many methods on temporal modeling have been proposed, it is hard to directly compare these methods, because the choice of feature extractor and loss function also have a large impact on the final performance. We comprehensively study and compare four different temporal modeling methods (temporal pooling, temporal attention, RNN and 3D convnets) for video-based person reID. We also propose a new attention generation network which adopts temporal convolution to extract temporal information among frames. The evaluation is done on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Human Mobility and Location-Based Analysis
MethodsConvolution
