From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

Aotian Zheng; Winston Sun; Bahaa Alattar; Vitaly Ablavsky; Jenq-Neng Hwang

arXiv:2604.22190·cs.CV·April 27, 2026

From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

Aotian Zheng, Winston Sun, Bahaa Alattar, Vitaly Ablavsky, Jenq-Neng Hwang

PDF

1 Repo

TL;DR

This paper introduces SAGA-ReID, a novel method for person re-identification that reconstructs identity features by aligning patch tokens with text-anchored vectors, improving robustness under occlusion and cross-camera variation.

Contribution

SAGA-ReID emphasizes spatially stable evidence by aligning patch tokens with CLIP's text embedding space, outperforming global pooling especially under occlusion.

Findings

01

SAGA-ReID shows up to +10.6 Rank-1 improvement on occluded benchmarks.

02

It outperforms global pooling as occlusion increases.

03

Structured reconstruction addresses limitations of backbone quality and architecture.

Abstract

CLIP-based person re-identification (ReID) methods aggregate spatial features into a single global \texttt{[CLS]} token optimized for image-text alignment rather than spatial selectivity, making representations fragile under occlusion and cross-camera variation. We propose SAGA-ReID, which reconstructs identity representations by aligning intermediate patch tokens with anchor vectors parameterized in CLIP's text embedding space -- emphasizing spatially stable evidence while suppressing corrupted or absent regions, without requiring textual descriptions of individual images. Controlled experiments isolate the aggregation mechanism under two qualitatively distinct conditions -- synthetic masking, where identity signal is absent, and realistic human distractors, where an overlapping person introduces semantically confusing signal -- with SAGA's advantage over global pooling growing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ipl-uw/Structured-Anchor-Guided-Aggregation-for-ReID
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.