Causal Bootstrapped Alignment for Unsupervised Video-Based Visible-Infrared Person Re-Identification

Shuang Li; Jiaxu Leng; Changjiang Kuang; Mingpi Tan; Yu Yuan; Xinbo Gao

arXiv:2604.15631·cs.CV·April 20, 2026

Causal Bootstrapped Alignment for Unsupervised Video-Based Visible-Infrared Person Re-Identification

Shuang Li, Jiaxu Leng, Changjiang Kuang, Mingpi Tan, Yu Yuan, Xinbo Gao

PDF

TL;DR

This paper introduces a novel unsupervised learning framework for video-based visible-infrared person re-identification, addressing modality bias and clustering issues with causal interventions and prototype-guided refinement.

Contribution

It proposes Causal Bootstrapped Alignment (CBA), combining causal interventions and uncertainty refinement to improve unsupervised cross-modality person re-identification.

Findings

01

CBA outperforms existing methods on HITSZ-VCM and BUPTCampus benchmarks.

02

Sequence-level causal interventions improve representation quality.

03

Prototype-guided refinement resolves cross-modality clustering mismatch.

Abstract

VVI-ReID is a critical technique for all-day surveillance, where temporal information provides additional cues beyond static images. However, existing approaches rely heavily on fully supervised learning with expensive cross-modality annotations, limiting scalability. To address this issue, we investigate Unsupervised Learning for VVI-ReID (USL-VVI-ReID), which learns identity-discriminative representations directly from unlabeled video tracklets. Directly extending image-based USL-VI-ReID methods to this setting with generic pretrained encoders leads to suboptimal performance. Such encoders suffer from weak identity discrimination and strong modality bias, resulting in severe intra-modality identity confusion and pronounced clustering granularity imbalance between visible and infrared modalities. These issues jointly degrade pseudo-label reliability and hinder effective cross-modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.