Vision Transformer for Robust Occluded Person Reidentification in Complex Surveillance Scenes

Bo Li; Duyuan Zheng; Xinyang Liu; Qingwen Li; Hong Li; Hongyan Cui; Ge Gao; and Chen Liu

arXiv:2510.27677·cs.CV·November 3, 2025

Vision Transformer for Robust Occluded Person Reidentification in Complex Surveillance Scenes

Bo Li, Duyuan Zheng, Xinyang Liu, Qingwen Li, Hong Li, Hongyan Cui, Ge Gao, and Chen Liu

PDF

Open Access

TL;DR

This paper introduces Sh-ViT, a lightweight Vision Transformer model designed to improve occluded person re-identification in complex surveillance scenes by enhancing robustness to occlusion, blur, and variations through novel modules and data augmentation.

Contribution

The paper presents Sh-ViT, a novel lightweight Vision Transformer with a Shuffle module and scenario-adapted augmentation, specifically tailored for occluded person re-identification in surveillance environments.

Findings

01

Sh-ViT achieves 83.2% Rank-1 accuracy on MyTT dataset.

02

Sh-ViT outperforms CNN and ViT baselines on benchmark datasets.

03

Constructed the MyTT dataset with over 10,000 pedestrians for real-world evaluation.

Abstract

Person re-identification (ReID) in surveillance is challenged by occlusion, viewpoint distortion, and poor image quality. Most existing methods rely on complex modules or perform well only on clear frontal images. We propose Sh-ViT (Shuffling Vision Transformer), a lightweight and robust model for occluded person ReID. Built on ViT-Base, Sh-ViT introduces three components: First, a Shuffle module in the final Transformer layer to break spatial correlations and enhance robustness to occlusion and blur; Second, scenario-adapted augmentation (geometric transforms, erasing, blur, and color adjustment) to simulate surveillance conditions; Third, DeiT-based knowledge distillation to improve learning with limited labels.To support real-world evaluation, we construct the MyTT dataset, containing over 10,000 pedestrians and 30,000+ images from base station inspections, with frequent equipment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition