TL;DR
This paper introduces PAD, a novel vision-text distillation framework utilizing frozen text encoders and adaptive prompts to improve lifelong person re-identification across multiple domains.
Contribution
It leverages frozen vision-language models and prompt distillation to enhance semantic stability and domain adaptation in lifelong person re-identification.
Findings
PAD outperforms state-of-the-art methods on multiple benchmarks.
The approach maintains a strong balance between stability and plasticity.
Extensive experiments validate the effectiveness of the proposed framework.
Abstract
Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-free approaches largely rely on visual-only distillation or parameter regularization, while overlooking the potential of auxiliary modalities, such as text, to preserve semantic stability and enable incremental plasticity. We observe that the frozen text encoder in pretrained vision-language models can serve as a stable semantic anchor across domains. To decouple the roles of vision and text, we propose Prompt-Anchored vision-text Distillation (PAD), an asymmetric vision-text framework for semantic alignment and cross-domain generalization. On the textual side, we distill prompts to preserve vision-text alignment under a fixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
