ALADIN:Attribute-Language Distillation Network for Person Re-Identification

Wang Zhou; Boran Duan; Haojun Ai; Ruiqi Lan; Ziyue Zhou

arXiv:2603.21482·cs.CV·April 1, 2026

ALADIN:Attribute-Language Distillation Network for Person Re-Identification

Wang Zhou, Boran Duan, Haojun Ai, Ruiqi Lan, Ziyue Zhou

PDF

TL;DR

ALADIN introduces a novel attribute-language distillation approach for person re-identification, enhancing fine-grained attribute understanding and robustness by leveraging CLIP and multimodal LLMs.

Contribution

It proposes a new attribute-local alignment and distillation framework that improves ReID performance and interpretability over existing global feature-based methods.

Findings

01

Significant performance gains on Market-1501, DukeMTMC-reID, and MSMT17 datasets.

02

Enhanced robustness under occlusions through attribute-local distillation.

03

Better generalization and interpretability compared to CNN, Transformer, and CLIP-based methods.

Abstract

Recent vision-language models such as CLIP provide strong cross-modal alignment, but current CLIP-guided ReID pipelines rely on global features and fixed prompts. This limits their ability to capture fine-grained attribute cues and adapt to diverse appearances. We propose ALADIN, an attribute-language distillation network that distills knowledge from a frozen CLIP teacher to a lightweight ReID student. ALADIN introduces fine-grained attribute-local alignment to establish adaptive text-visual correspondence and robust representation learning. A Scene-Aware Prompt Generator produces image-specific soft prompts to facilitate adaptive alignment. Attribute-local distillation enforces consistency between textual attributes and local visual features, significantly enhancing robustness under occlusions. Furthermore, we employ cross-modal contrastive and relation distillation to preserve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.