A CLIP-based Uncertainty Modal Modeling (UMM) Framework for Pedestrian Re-Identification in Autonomous Driving
Jialin Li, Shuqi Wu, Ning Wang

TL;DR
This paper introduces a lightweight CLIP-based framework called UMM that enhances pedestrian re-identification in autonomous driving by effectively handling uncertain or missing modalities with improved robustness and efficiency.
Contribution
The paper proposes a novel UMM framework that integrates multimodal token mapping, synthetic data augmentation, and cross-modal interaction, leveraging CLIP for efficient multimodal fusion in resource-limited settings.
Findings
UMM achieves high robustness under uncertain modality conditions.
The framework demonstrates strong generalization across different data types.
It offers computational efficiency suitable for real-time autonomous driving applications.
Abstract
Re-Identification (ReID) is a critical technology in intelligent perception systems, especially within autonomous driving, where onboard cameras must identify pedestrians across views and time in real-time to support safe navigation and trajectory prediction. However, the presence of uncertain or missing input modalities--such as RGB, infrared, sketches, or textual descriptions--poses significant challenges to conventional ReID approaches. While large-scale pre-trained models offer strong multimodal semantic modeling capabilities, their computational overhead limits practical deployment in resource-constrained environments. To address these challenges, we propose a lightweight Uncertainty Modal Modeling (UMM) framework, which integrates a multimodal token mapper, synthetic modality augmentation strategy, and cross-modal cue interactive learner. Together, these components enable unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
