Person Re-ID in 2025: Supervised, Self-Supervised, and Language-Aligned. What Works?
Lakshman Balasubramanian

TL;DR
This paper evaluates various training paradigms for person re-identification, revealing that language-aligned models surprisingly outperform supervised models in cross-domain scenarios, highlighting their potential for robust generalization.
Contribution
The study systematically compares supervised, self-supervised, and language-aligned models for ReID, demonstrating the robustness of language-aligned models across domains.
Findings
Supervised models perform well within their training domain.
Language-aligned models show strong cross-domain robustness.
Supervised models struggle with cross-domain generalization.
Abstract
Person Re-Identification (ReID) remains a challenging problem in computer vision. This work reviews various training paradigm and evaluates the robustness of state-of-the-art ReID models in cross-domain applications and examines the role of foundation models in improving generalization through richer, more transferable visual representations. We compare three training paradigms, supervised, self-supervised, and language-aligned models. Through the study the aim is to answer the following questions: Can supervised models generalize in cross-domain scenarios? How does foundation models like SigLIP2 perform for the ReID tasks? What are the weaknesses of current supervised and foundational models for ReID? We have conducted the analysis across 11 models and 9 datasets. Our results show a clear split: supervised models dominate their training domain but crumble on cross-domain data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Face recognition and analysis
