Not Every Patch is Needed: Towards a More Efficient and Effective   Backbone for Video-based Person Re-identification

Lanyun Zhu; Tianrun Chen; Deyi Ji; Jieping Ye; Jun Liu

arXiv:2501.16811·cs.CV·January 29, 2025

Not Every Patch is Needed: Towards a More Efficient and Effective Backbone for Video-based Person Re-identification

Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, Jun Liu

PDF

Open Access

TL;DR

This paper introduces a selective patch-based backbone for video person re-identification that reduces computational costs significantly while maintaining or improving accuracy, by focusing on crucial patches and leveraging pseudo frame context.

Contribution

A novel patch selection mechanism and network structure that together enable efficient and effective video-based person ReID with reduced computation.

Findings

01

Reduces computational cost by 74% compared to ViT-B

02

Achieves comparable accuracy to ViT-B

03

Outperforms ResNet50 in accuracy

Abstract

This paper proposes a new effective and efficient plug-and-play backbone for video-based person re-identification (ReID). Conventional video-based ReID methods typically use CNN or transformer backbones to extract deep features for every position in every sampled video frame. Here, we argue that this exhaustive feature extraction could be unnecessary, since we find that different frames in a ReID video often exhibit small differences and contain many similar regions due to the relatively slight movements of human beings. Inspired by this, a more selective, efficient paradigm is explored in this paper. Specifically, we introduce a patch selection mechanism to reduce computational cost by choosing only the crucial and non-repetitive patches for feature extraction. Additionally, we present a novel network structure that generates and utilizes pseudo frame global context to address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gait Recognition and Analysis · Face recognition and analysis