# Convolutional Temporal Attention Model for Video-based Person   Re-identification

**Authors:** Tanzila Rahman, Mrigank Rochan, Yang Wang

arXiv: 1904.04492 · 2019-04-11

## TL;DR

This paper introduces a convolutional temporal attention model that effectively aggregates frame features in videos for person re-identification, emphasizing informative frames to improve matching accuracy.

## Contribution

It proposes a fully convolutional temporal attention approach formulated as a sequence labeling problem, enhancing video feature aggregation for re-identification.

## Key findings

- Outperforms state-of-the-art methods on benchmark datasets
- Effectively identifies and emphasizes important frames in videos
- Demonstrates robustness across multiple datasets

## Abstract

The goal of video-based person re-identification is to match two input videos, so that the distance of the two videos is small if two videos contain the same person. A common approach for person re-identification is to first extract image features for all frames in the video, then aggregate all the features to form a video-level feature. The video-level features of two videos can then be used to calculate the distance of the two videos. In this paper, we propose a temporal attention approach for aggregating frame-level features into a video-level feature vector for re-identification. Our method is motivated by the fact that not all frames in a video are equally informative. We propose a fully convolutional temporal attention model for generating the attention scores. Fully convolutional network (FCN) has been widely used in semantic segmentation for generating 2D output maps. In this paper, we formulate video based person re-identification as a sequence labeling problem like semantic segmentation. We establish a connection between them and modify FCN to generate attention scores to represent the importance of each frame. Extensive experiments on three different benchmark datasets (i.e. iLIDS-VID, PRID-2011 and SDU-VID) show that our proposed method outperforms other state-of-the-art approaches.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.04492/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1904.04492/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1904.04492/full.md

---
Source: https://tomesphere.com/paper/1904.04492