Cascade Transformers for End-to-End Person Search

Rui Yu; Dawei Du; Rodney LaLonde; Daniel Davila; Christopher Funk,; Anthony Hoogs; Brian Clipp

arXiv:2203.09642·cs.CV·March 21, 2022

Cascade Transformers for End-to-End Person Search

Rui Yu, Dawei Du, Rodney LaLonde, Daniel Davila, Christopher Funk,, Anthony Hoogs, Brian Clipp

PDF

Open Access 1 Repo

TL;DR

This paper introduces COAT, a three-stage cascade transformer model that progressively refines person detection and re-identification, effectively handling occlusions and variations to achieve state-of-the-art results in person search.

Contribution

The paper presents a novel cascade occluded attention transformer that refines person search through multiple stages, incorporating occluded attention to improve robustness against occlusions and pose variations.

Findings

01

Achieves state-of-the-art performance on benchmark datasets.

02

Effectively handles occlusions and pose variations.

03

Demonstrates the benefit of multi-stage refinement in person search.

Abstract

The goal of person search is to localize a target person from a gallery set of scene images, which is extremely challenging due to large scale variations, pose/viewpoint changes, and occlusions. In this paper, we propose the Cascade Occluded Attention Transformer (COAT) for end-to-end person search. Our three-stage cascade design focuses on detecting people in the first stage, while later stages simultaneously and progressively refine the representation for person detection and re-identification. At each stage the occluded attention transformer applies tighter intersection over union thresholds, forcing the network to learn coarse-to-fine pose/scale invariant features. Meanwhile, we calculate each detection's occluded attention to differentiate a person's tokens from other people or the background. In this way, we simulate the effect of other objects occluding a person of interest at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kitware/coat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Human Pose and Action Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections · Dropout