Self-Supervised Pre-Training for Transformer-Based Person   Re-Identification

Hao Luo; Pichao Wang; Yi Xu; Feng Ding; Yanxin Zhou; Fan Wang; Hao Li,; Rong Jin

arXiv:2111.12084·cs.CV·November 24, 2021·40 cites

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li,, Rong Jin

PDF

Open Access 3 Repos

TL;DR

This paper introduces a self-supervised pre-training approach for transformer-based person re-identification, reducing data requirements and domain gap issues, leading to state-of-the-art results on benchmark datasets.

Contribution

It proposes a self-supervised learning method with ViT on unlabelled person images, a data selection strategy using CFS, and a ReID-specific module ICS, advancing ReID performance with less data.

Findings

01

Self-supervised ViT surpasses supervised ImageNet pre-training in ReID.

02

CFS-based data sampling effectively reduces domain gap.

03

Achieves state-of-the-art accuracy on Market-1501 and MSMT17.

Abstract

Transformer-based supervised pre-training achieves great performance in person re-identification (ReID). However, due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset (e.g. ImageNet-21K) to boost the performance because of the strong data fitting ability of the transformer. To address this challenge, this work targets to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure, respectively. We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks. To further reduce the domain gap and accelerate the pre-training, the Catastrophic Forgetting Score (CFS) is proposed to evaluate the gap…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections · Softmax · Residual Connection · Layer Normalization · Adam