Video-Based Convolutional Attention for Person Re-Identification
Marco Zamprogno, Marco Passon, Niki Martinel, Giuseppe Serra, Giuseppe, Lancioni, Christian Micheloni, Carlo Tasso, Gian Luca Foresti

TL;DR
This paper introduces a video-based person re-identification method using a Siamese network with attention mechanisms at frame and video levels, achieving superior results on a challenging dataset.
Contribution
It presents a novel joint attention framework that processes frame and video information concurrently within a simple architecture for person re-identification.
Findings
Outperforms state-of-the-art on iLIDS-VID dataset
Effective attention mechanisms at frame and video levels
Simple yet powerful architecture
Abstract
In this paper we consider the problem of video-based person re-identification, which is the task of associating videos of the same person captured by different and non-overlapping cameras. We propose a Siamese framework in which video frames of the person to re-identify and of the candidate one are processed by two identical networks which produce a similarity score. We introduce an attention mechanisms to capture the relevant information both at frame level (spatial information) and at video level (temporal information given by the importance of a specific frame within the sequence). One of the novelties of our approach is given by a joint concurrent processing of both frame and video levels, providing in such a way a very simple architecture. Despite this fact, our approach achieves better performance than the state-of-the-art on the challenging iLIDS-VID dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Neural Network Applications
