Vision-Language Attribute Disentanglement and Reinforcement for Lifelong Person Re-Identification

Kunlun Xu; Haotong Cheng; Jiangmeng Li; Xu Zou; Jiahuan Zhou

arXiv:2603.19678·cs.CV·March 23, 2026

Vision-Language Attribute Disentanglement and Reinforcement for Lifelong Person Re-Identification

Kunlun Xu, Haotong Cheng, Jiangmeng Li, Xu Zou, Jiahuan Zhou

PDF

Open Access

TL;DR

This paper introduces VLADR, a novel approach leveraging vision-language models to improve lifelong person re-identification by disentangling and reinforcing human attributes across domains, enhancing knowledge transfer and reducing forgetting.

Contribution

The paper proposes a new VLM-driven method with multi-grain attribute disentanglement and cross-modal reinforcement for improved lifelong person re-identification.

Findings

01

Outperforms state-of-the-art in anti-forgetting and generalization.

02

Effectively models global and local human attributes.

03

Enhances inter-domain knowledge transfer.

Abstract

Lifelong person re-identification (LReID) aims to learn from varying domains to obtain a unified person retrieval model. Existing LReID approaches typically focus on learning from scratch or a visual classification-pretrained model, while the Vision-Language Model (VLM) has shown generalizable knowledge in a variety of tasks. Although existing methods can be directly adapted to the VLM, since they only consider global-aware learning, the fine-grained attribute knowledge is underleveraged, leading to limited acquisition and anti-forgetting capacity. To address this problem, we introduce a novel VLM-driven LReID approach named Vision-Language Attribute Disentanglement and Reinforcement (VLADR). Our key idea is to explicitly model the universally shared human attributes to improve inter-domain knowledge transfer, thereby effectively utilizing historical knowledge to reinforce new knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Multimodal Machine Learning Applications