Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification
Jiachen Li, Menglin Wang, Xiaojin Gong

TL;DR
This paper introduces a novel transformer-based approach for unsupervised person re-identification that extracts multi-grained features and employs contrastive learning, significantly improving performance over existing methods.
Contribution
It proposes a dual-branch Vision Transformer architecture with multi-grained feature extraction and a contrastive learning framework tailored for unsupervised person Re-ID.
Findings
Outperforms state-of-the-art unsupervised Re-ID methods on multiple datasets.
Effectively mitigates the performance gap between unsupervised and supervised Re-ID.
Demonstrates the effectiveness of multi-grained features from transformers in unsupervised learning.
Abstract
Multi-grained features extracted from convolutional neural networks (CNNs) have demonstrated their strong discrimination ability in supervised person re-identification (Re-ID) tasks. Inspired by them, this work investigates the way of extracting multi-grained features from a pure transformer network to address the unsupervised Re-ID problem that is label-free but much more challenging. To this end, we build a dual-branch network architecture based upon a modified Vision Transformer (ViT). The local tokens output in each branch are reshaped and then uniformly partitioned into multiple stripes to generate part-level features, while the global tokens of two branches are averaged to produce a global feature. Further, based upon offline-online associated camera-aware proxies (O2CAP) that is a top-performing unsupervised Re-ID method, we define offline and online contrastive learning losses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Gait Recognition and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Adam · Linear Layer · Dense Connections · Residual Connection · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing
