DETR for Crowd Pedestrian Detection
Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, and Junjie Yan, Wanli Ouyang, Zhidong Deng

TL;DR
This paper introduces PED, an end-to-end transformer-based pedestrian detector optimized for crowded scenes, addressing issues of training efficiency and occlusion handling, and demonstrating superior performance over existing methods.
Contribution
The paper proposes a novel decoder, a mechanism for occlusion exploitation, and a faster bipartite matching algorithm to improve crowd pedestrian detection with transformers.
Findings
PED outperforms previous end-to-end detectors and Faster-RCNN on CityPersons and CrowdHuman datasets.
The proposed methods improve training efficiency and detection accuracy in crowded scenes.
PED achieves comparable results to state-of-the-art pedestrian detection methods.
Abstract
Pedestrian detection in crowd scenes poses a challenging problem due to the heuristic defined mapping from anchors to pedestrians and the conflict between NMS and highly overlapped pedestrians. The recently proposed end-to-end detectors(ED), DETR and deformable DETR, replace hand designed components such as NMS and anchors using the transformer architecture, which gets rid of duplicate predictions by computing all pairwise interactions between queries. Inspired by these works, we explore their performance on crowd pedestrian detection. Surprisingly, compared to Faster-RCNN with FPN, the results are opposite to those obtained on COCO. Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes. In this work, we identify the underlying motives driving ED's poor performance and propose a new decoder to address them. Moreover, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Fire Detection and Safety Systems
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Multi-Head Attention · Residual Connection · Convolution · 1x1 Convolution · Attention Is All You Need · Byte Pair Encoding
