Video Person Re-Identification using Learned Clip Similarity Aggregation
Neeraj Matiyali, Gaurav Sharma

TL;DR
This paper proposes a learned clip similarity aggregation method for video person re-identification, utilizing 3D CNNs with RGB inputs, leading to improved or competitive results on public benchmarks.
Contribution
It introduces a learned aggregation function to filter hard clip pairs and employs 3D CNNs with RGB inputs, simplifying and enhancing video re-identification.
Findings
Achieves comparable or better performance than methods using optical flow.
Effectively filters out uninformative clip pairs.
Validates the approach on three public benchmarks.
Abstract
We address the challenging task of video-based person re-identification. Recent works have shown that splitting the video sequences into clips and then aggregating clip based similarity is appropriate for the task. We show that using a learned clip similarity aggregation function allows filtering out hard clip pairs, e.g. where the person is not clearly visible, is in a challenging pose, or where the poses in the two clips are too different to be informative. This allows the method to focus on clip-pairs which are more informative for the task. We also introduce the use of 3D CNNs for video-based re-identification and show their effectiveness by performing equivalent to previous works, which use optical flow in addition to RGB, while using RGB inputs only. We give quantitative results on three challenging public benchmarks and show better or competitive performance. We also validate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
