Rethinking Person Re-Identification via Semantic-Based Pretraining
Suncheng Xiang, Jingsheng Gao, Zirui Zhang, Mengyuan Guan, Binjie Yan,, Ting Liu, Dahong Qian, Yuzhuo Fu

TL;DR
This paper proposes a semantic-based pretraining method called VTBR for person re-identification, utilizing dense captions from a new dataset, which achieves competitive results with less data than traditional ImageNet pretraining.
Contribution
Introduces a novel semantic pretraining approach using dense captions and a new dataset, reducing data requirements for effective person Re-ID model initialization.
Findings
VTBR achieves comparable performance to ImageNet pretraining.
Uses up to 1.4x fewer images for pretraining.
Demonstrates the effectiveness of semantic pretraining in Re-ID tasks.
Abstract
Pretraining is a dominant paradigm in computer vision. Generally, supervised ImageNet pretraining is commonly used to initialize the backbones of person re-identification (Re-ID) models. However, recent works show a surprising result that CNN-based pretraining on ImageNet has limited impacts on Re-ID system due to the large domain gap between ImageNet and person Re-ID data. To seek an alternative to traditional pretraining, here we investigate semantic-based pretraining as another method to utilize additional textual data against ImageNet pretraining. Specifically, we manually construct a diversified FineGPR-C caption dataset for the first time on person Re-ID events. Based on it, a pure semantic-based pretraining approach named VTBR is proposed to adopt dense captions to learn visual representations with fewer images. We train convolutional neural networks from scratch on the captions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Neural Network Applications
