Vision-Language Attribute Disentanglement and Reinforcement for Lifelong Person Re-Identification
Kunlun Xu, Haotong Cheng, Jiangmeng Li, Xu Zou, Jiahuan Zhou

TL;DR
This paper introduces VLADR, a novel approach leveraging vision-language models to improve lifelong person re-identification by disentangling and reinforcing human attributes across domains, enhancing knowledge transfer and reducing forgetting.
Contribution
The paper proposes a new VLM-driven method with multi-grain attribute disentanglement and cross-modal reinforcement for improved lifelong person re-identification.
Findings
Outperforms state-of-the-art in anti-forgetting and generalization.
Effectively models global and local human attributes.
Enhances inter-domain knowledge transfer.
Abstract
Lifelong person re-identification (LReID) aims to learn from varying domains to obtain a unified person retrieval model. Existing LReID approaches typically focus on learning from scratch or a visual classification-pretrained model, while the Vision-Language Model (VLM) has shown generalizable knowledge in a variety of tasks. Although existing methods can be directly adapted to the VLM, since they only consider global-aware learning, the fine-grained attribute knowledge is underleveraged, leading to limited acquisition and anti-forgetting capacity. To address this problem, we introduce a novel VLM-driven LReID approach named Vision-Language Attribute Disentanglement and Reinforcement (VLADR). Our key idea is to explicitly model the universally shared human attributes to improve inter-domain knowledge transfer, thereby effectively utilizing historical knowledge to reinforce new knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Multimodal Machine Learning Applications
