Set Augmented Triplet Loss for Video Person Re-Identification
Pengfei Fang, Pan Ji, Lars Petersson, Mehrtash Harandi

TL;DR
This paper introduces a novel set-based triplet loss for video person re-identification, improving feature optimization at the frame level and achieving state-of-the-art results.
Contribution
It models video clips as sets and proposes a hybrid set-aware triplet loss with a new hard positive set construction strategy.
Findings
Achieves state-of-the-art performance on standard benchmarks.
Demonstrates the effectiveness of set-based distance metrics.
Improves feature learning at the frame level.
Abstract
Modern video person re-identification (re-ID) machines are often trained using a metric learning approach, supervised by a triplet loss. The triplet loss used in video re-ID is usually based on so-called clip features, each aggregated from a few frame features. In this paper, we propose to model the video clip as a set and instead study the distance between sets in the corresponding triplet loss. In contrast to the distance between clip representations, the distance between clip sets considers the pair-wise similarity of each element (i.e., frame representation) between two sets. This allows the network to directly optimize the feature representation at a frame level. Apart from the commonly-used set distance metrics (e.g., ordinary distance and Hausdorff distance), we further propose a hybrid distance metric, tailored for the set-aware triplet loss. Also, we propose a hard positive set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gait Recognition and Analysis
MethodsTriplet Loss
