Vision Transformer for Robust Occluded Person Reidentification in Complex Surveillance Scenes
Bo Li, Duyuan Zheng, Xinyang Liu, Qingwen Li, Hong Li, Hongyan Cui, Ge Gao, and Chen Liu

TL;DR
This paper introduces Sh-ViT, a lightweight Vision Transformer model designed to improve occluded person re-identification in complex surveillance scenes by enhancing robustness to occlusion, blur, and variations through novel modules and data augmentation.
Contribution
The paper presents Sh-ViT, a novel lightweight Vision Transformer with a Shuffle module and scenario-adapted augmentation, specifically tailored for occluded person re-identification in surveillance environments.
Findings
Sh-ViT achieves 83.2% Rank-1 accuracy on MyTT dataset.
Sh-ViT outperforms CNN and ViT baselines on benchmark datasets.
Constructed the MyTT dataset with over 10,000 pedestrians for real-world evaluation.
Abstract
Person re-identification (ReID) in surveillance is challenged by occlusion, viewpoint distortion, and poor image quality. Most existing methods rely on complex modules or perform well only on clear frontal images. We propose Sh-ViT (Shuffling Vision Transformer), a lightweight and robust model for occluded person ReID. Built on ViT-Base, Sh-ViT introduces three components: First, a Shuffle module in the final Transformer layer to break spatial correlations and enhance robustness to occlusion and blur; Second, scenario-adapted augmentation (geometric transforms, erasing, blur, and color adjustment) to simulate surveillance conditions; Third, DeiT-based knowledge distillation to improve learning with limited labels.To support real-world evaluation, we construct the MyTT dataset, containing over 10,000 pedestrians and 30,000+ images from base station inspections, with frequent equipment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition
