Semantic Head Enhanced Pedestrian Detection in a Crowd
Ruiqi Lu, Huimin Ma

TL;DR
This paper introduces HBAN, a pedestrian detection model that leverages semantic head detection and head-body alignment to improve detection accuracy in crowded scenes with occlusion.
Contribution
The paper proposes a novel head-body alignment network that uses weakly labeled semantic head regions and alignment loss to enhance pedestrian detection.
Findings
HBAN outperforms baseline models on CityPersons dataset.
Semantic head detection improves robustness against occlusion.
Head-body alignment contributes significantly to detection accuracy.
Abstract
Pedestrian detection in the crowd is a challenging task because of intra-class occlusion. More prior information is needed for the detector to be robust against it. Human head area is naturally a strong cue because of its stable appearance, visibility and relative location to body. Inspired by it, we adopt an extra branch to conduct semantic head detection in parallel with traditional body branch. Instead of manually labeling the head regions, we use weak annotations inferred directly from body boxes, which is named as `semantic head'. In this way, the head detection is formulated into using a special part of labeled box to detect the corresponding part of human body, which surprisingly improves the performance and robustness to occlusion. Moreover, the head-body alignment structure is explicitly explored by introducing Alignment Loss, which functions in a self-supervised manner. Based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Indoor and Outdoor Localization Technologies
