Multi-direction and Multi-scale Pyramid in Transformer for Video-based   Pedestrian Retrieval

Xianghao Zang; Ge Li; Wei Gao

arXiv:2202.06014·cs.CV·April 7, 2022

Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

Xianghao Zang, Ge Li, Wei Gao

PDF

1 Repo

TL;DR

This paper introduces PiT, a transformer-based model with multi-directional and multi-scale pyramids that enhances fine-grained feature extraction for video-based pedestrian re-identification, achieving state-of-the-art results.

Contribution

It proposes a novel multi-direction and multi-scale pyramid structure within transformers to better capture fine-grained, part-informed features for pedestrian retrieval.

Findings

01

Achieves state-of-the-art performance on MARS and iLIDS-VID benchmarks.

02

Demonstrates the effectiveness of multi-directional and multi-scale pyramids through ablation studies.

03

Outperforms existing methods in video-based pedestrian re-identification.

Abstract

In video surveillance, pedestrian retrieval (also called person re-identification) is a critical task. This task aims to retrieve the pedestrian of interest from non-overlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still suffer from ignoring fine-grained, part-informed information. This paper proposes a multi-direction and multi-scale Pyramid in Transformer (PiT) to solve this problem. In transformer-based architecture, each pedestrian image is split into many patches. Then, these patches are fed to transformer layers to obtain the feature representation of this image. To explore the fine-grained information, this paper proposes to apply vertical division and horizontal division on these patches to generate different-direction human parts. These parts provide more fine-grained information. To fuse multi-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deropty/PiT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Softmax · Layer Normalization · Multi-Head Attention · Dense Connections · Byte Pair Encoding · Dropout · Label Smoothing · Position-Wise Feed-Forward Layer