Causal Bootstrapped Alignment for Unsupervised Video-Based Visible-Infrared Person Re-Identification
Shuang Li, Jiaxu Leng, Changjiang Kuang, Mingpi Tan, Yu Yuan, Xinbo Gao

TL;DR
This paper introduces a novel unsupervised learning framework for video-based visible-infrared person re-identification, addressing modality bias and clustering issues with causal interventions and prototype-guided refinement.
Contribution
It proposes Causal Bootstrapped Alignment (CBA), combining causal interventions and uncertainty refinement to improve unsupervised cross-modality person re-identification.
Findings
CBA outperforms existing methods on HITSZ-VCM and BUPTCampus benchmarks.
Sequence-level causal interventions improve representation quality.
Prototype-guided refinement resolves cross-modality clustering mismatch.
Abstract
VVI-ReID is a critical technique for all-day surveillance, where temporal information provides additional cues beyond static images. However, existing approaches rely heavily on fully supervised learning with expensive cross-modality annotations, limiting scalability. To address this issue, we investigate Unsupervised Learning for VVI-ReID (USL-VVI-ReID), which learns identity-discriminative representations directly from unlabeled video tracklets. Directly extending image-based USL-VI-ReID methods to this setting with generic pretrained encoders leads to suboptimal performance. Such encoders suffer from weak identity discrimination and strong modality bias, resulting in severe intra-modality identity confusion and pronounced clustering granularity imbalance between visible and infrared modalities. These issues jointly degrade pseudo-label reliability and hinder effective cross-modality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
