Beyond Visual Cues: Semantic-Driven Token Filtering and Expert Routing for Anytime Person ReID
Jiaxuan Li, Xin Wen, Zhihang Li

TL;DR
This paper introduces STFER, a novel framework leveraging large vision-language models to generate identity-consistent semantic text, improving person re-identification across diverse conditions and outperforming existing methods.
Contribution
The paper proposes a semantic-driven approach using LVLMs for token filtering and expert routing, enhancing robustness to clothing changes and modality shifts in person ReID.
Findings
Achieves state-of-the-art results on the AT-USTC dataset.
Demonstrates superior generalization across 5 ReID benchmarks.
Outperforms existing methods under challenging conditions.
Abstract
Any-Time Person Re-identification (AT-ReID) necessitates the robust retrieval of target individuals under arbitrary conditions, encompassing both modality shifts (daytime and nighttime) and extensive clothing-change scenarios, ranging from short-term to long-term intervals. However, existing methods are highly relying on pure visual features, which are prone to change due to environmental and time factors, resulting in significantly performance deterioration under scenarios involving illumination caused modality shifts or cloth-change. In this paper, we propose Semantic-driven Token Filtering and Expert Routing (STFER), a novel framework that leverages the ability of Large Vision-Language Models (LVLMs) to generate identity consistency text, which provides identity-discriminative features that are robust to both clothing variations and cross-modality shifts between RGB and IR.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
