SCING:Towards More Efficient and Robust Person Re-Identification through Selective Cross-modal Prompt Tuning
Yunfei Xie, Yuxuan Cheng, Juncheng Wu, Haoyu Zhang, Yuyin Zhou, Shoudong Han

TL;DR
SCING introduces a lightweight, cross-modal prompt tuning framework that improves person re-identification by enhancing alignment and robustness while reducing computational costs, outperforming existing methods on multiple benchmarks.
Contribution
The paper proposes SCING, a novel framework with SVIP and PDCA modules, for efficient and robust person ReID using cross-modal prompt tuning without heavy adapters.
Findings
Achieves state-of-the-art performance on multiple ReID benchmarks.
Reduces computational overhead compared to existing adapter-based methods.
Enhances robustness against real-world image perturbations.
Abstract
Recent advancements in adapting vision-language pre-training models like CLIP for person re-identification (ReID) tasks often rely on complex adapter design or modality-specific tuning while neglecting cross-modal interaction, leading to high computational costs or suboptimal alignment. To address these limitations, we propose a simple yet effective framework named Selective Cross-modal Prompt Tuning (SCING) that enhances cross-modal alignment and robustness against real-world perturbations. Our method introduces two key innovations: Firstly, we proposed Selective Visual Prompt Fusion (SVIP), a lightweight module that dynamically injects discriminative visual features into text prompts via a cross-modal gating mechanism. Moreover, the proposed Perturbation-Driven Consistency Alignment (PDCA) is a dual-path training strategy that enforces invariant feature alignment under random image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Multimodal Machine Learning Applications
