Image Re-Identification: Where Self-supervision Meets Vision-Language Learning
Bin Wang, Yuying Liang, Lei Cai, Huakun Huang, and Huanqiang Zeng

TL;DR
This paper introduces SVLL-ReID, a novel two-stage training method that combines self-supervision with pre-trained CLIP to improve image re-identification performance without relying on explicit text labels.
Contribution
It is the first to integrate self-supervision with CLIP for image ReID through a two-stage training process, enhancing discriminability of features and prompts.
Findings
SVLL-ReID outperforms state-of-the-art methods on six benchmarks.
Language self-supervision improves text prompt distinguishability.
Vision self-supervision enhances image feature discriminability.
Abstract
Recently, large-scale vision-language pre-trained models like CLIP have shown impressive performance in image re-identification (ReID). In this work, we explore whether self-supervision can aid in the use of CLIP for image ReID tasks. Specifically, we propose SVLL-ReID, the first attempt to integrate self-supervision and pre-trained CLIP via two training stages to facilitate the image ReID. We observe that: 1) incorporating language self-supervision in the first training stage can make the learnable text prompts more distinguishable, and 2) incorporating vision self-supervision in the second training stage can make the image features learned by the image encoder more discriminative. These observations imply that: 1) the text prompt learning in the first stage can benefit from the language self-supervision, and 2) the image feature learning in the second stage can benefit from the vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Management, Economics, and Public Policy · Wikis in Education and Collaboration
MethodsContrastive Language-Image Pre-training
