Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification
Nanxing Meng, Qizao Wang, Bin Li, Xiangyang Xue

TL;DR
This paper introduces an unsupervised framework for video person re-identification that leverages self-supervised clustering and refined tracklet partitioning to improve feature learning without annotations.
Contribution
The paper proposes the SSR-C framework with NFTP and CSC modules, enabling effective unsupervised learning for video person re-ID by reducing noise and generating reliable pseudo labels.
Findings
Achieves state-of-the-art results on MARS and DukeMTMC-VideoReID datasets.
Outperforms existing unsupervised methods and rivals supervised approaches.
Demonstrates robustness to noisy tracking data.
Abstract
With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply average the frame features of each tracklet, overlooking unexpected variations and inherent identity consistency within tracklets. In this paper, we propose the Self-Supervised Refined Clustering (SSR-C) framework without relying on any annotation or auxiliary information to promote unsupervised video person re-identification. Specifically, we first propose the Noise-Filtered Tracklet Partition (NFTP) module to reduce the feature bias of tracklets caused by noisy tracking results, and sequentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
