Global-Local Temporal Representations For Video Person Re-Identification

Jianing Li; Jingdong Wang; Qi Tian; Wen Gao; Shiliang Zhang

arXiv:1908.10049·cs.CV·April 22, 2020

Global-Local Temporal Representations For Video Person Re-Identification

Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang

PDF

TL;DR

This paper introduces GLTR, a multi-scale temporal feature representation for video person Re-Identification, combining short-term motion cues and long-term relations to improve accuracy.

Contribution

The paper presents a novel Global-Local Temporal Representation (GLTR) that effectively captures multi-scale temporal cues using dilated convolutions and self-attention for video ReID.

Findings

01

Achieves 87.02% Rank-1 accuracy on MARS dataset

02

Outperforms existing methods on four video ReID datasets

03

Effectively models short-term and long-term temporal cues

Abstract

This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReID). GLTR is constructed by first modeling the short-term temporal cues among adjacent frames, then capturing the long-term relations among inconsecutive frames. Specifically, the short-term temporal cues are modeled by parallel dilated convolutions with different temporal dilation rates to represent the motion and appearance of pedestrian. The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences. The short and long-term temporal cues are aggregated as the final GLTR by a simple single-stream CNN. GLTR shows substantial superiority to existing features learned with body part cues or metric learning on four widely-used video ReID datasets. For instance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.