Sequential Transformer for End-to-End Person Search
Long Chen, Jinhua Xu

TL;DR
This paper introduces SeqTR, a sequential transformer model that effectively combines detection and re-identification tasks for person search, achieving superior performance on benchmark datasets.
Contribution
The paper proposes a novel sequential transformer architecture with dedicated detection and re-ID transformers, improving end-to-end person search performance.
Findings
Outperforms existing methods with 59.3% mAP on PRW
Achieves 94.8% mAP on CUHK-SYSU
Demonstrates robustness and effectiveness of the sequential transformer approach
Abstract
Person Search aims to simultaneously localize and recognize a target person from realistic and uncropped gallery images. One major challenge of person search comes from the contradictory goals of the two sub-tasks, i.e., person detection focuses on finding the commonness of all persons so as to distinguish persons from the background, while person re-identification (re-ID) focuses on the differences among different persons. In this paper, we propose a novel Sequential Transformer (SeqTR) for end-to-end person search to deal with this challenge. Our SeqTR contains a detection transformer and a novel re-ID transformer that sequentially addresses detection and re-ID tasks. The re-ID transformer comprises the self-attention layer that utilizes contextual information and the cross-attention layer that learns local fine-grained discriminative features of the human body. Moreover, the re-ID…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Gait Recognition and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Adam · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding
