Self-Supervised Pre-Training for Transformer-Based Person Re-Identification
Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li,, Rong Jin

TL;DR
This paper introduces a self-supervised pre-training approach for transformer-based person re-identification, reducing data requirements and domain gap issues, leading to state-of-the-art results on benchmark datasets.
Contribution
It proposes a self-supervised learning method with ViT on unlabelled person images, a data selection strategy using CFS, and a ReID-specific module ICS, advancing ReID performance with less data.
Findings
Self-supervised ViT surpasses supervised ImageNet pre-training in ReID.
CFS-based data sampling effectively reduces domain gap.
Achieves state-of-the-art accuracy on Market-1501 and MSMT17.
Abstract
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID). However, due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset (e.g. ImageNet-21K) to boost the performance because of the strong data fitting ability of the transformer. To address this challenge, this work targets to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure, respectively. We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks. To further reduce the domain gap and accelerate the pre-training, the Catastrophic Forgetting Score (CFS) is proposed to evaluate the gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections · Softmax · Residual Connection · Layer Normalization · Adam
