Sequential Transformer for End-to-End Person Search

Long Chen; Jinhua Xu

arXiv:2211.04323·cs.CV·November 17, 2022

Sequential Transformer for End-to-End Person Search

Long Chen, Jinhua Xu

PDF

Open Access

TL;DR

This paper introduces SeqTR, a sequential transformer model that effectively combines detection and re-identification tasks for person search, achieving superior performance on benchmark datasets.

Contribution

The paper proposes a novel sequential transformer architecture with dedicated detection and re-ID transformers, improving end-to-end person search performance.

Findings

01

Outperforms existing methods with 59.3% mAP on PRW

02

Achieves 94.8% mAP on CUHK-SYSU

03

Demonstrates robustness and effectiveness of the sequential transformer approach

Abstract

Person Search aims to simultaneously localize and recognize a target person from realistic and uncropped gallery images. One major challenge of person search comes from the contradictory goals of the two sub-tasks, i.e., person detection focuses on finding the commonness of all persons so as to distinguish persons from the background, while person re-identification (re-ID) focuses on the differences among different persons. In this paper, we propose a novel Sequential Transformer (SeqTR) for end-to-end person search to deal with this challenge. Our SeqTR contains a detection transformer and a novel re-ID transformer that sequentially addresses detection and re-ID tasks. The re-ID transformer comprises the self-attention layer that utilizes contextual information and the cross-attention layer that learns local fine-grained discriminative features of the human body. Moreover, the re-ID…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Gait Recognition and Analysis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Adam · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding