End-to-end people detection in crowded scenes

Russell Stewart; Mykhaylo Andriluka

arXiv:1506.04878·cs.CV·July 10, 2015·29 cites

End-to-end people detection in crowded scenes

Russell Stewart, Mykhaylo Andriluka

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces an end-to-end model for detecting people in crowded scenes that directly outputs detection sets without post-processing, utilizing a recurrent LSTM and a novel set-based loss function.

Contribution

It presents a novel end-to-end detection framework that eliminates the need for post-processing steps like non-maximum suppression, improving detection in crowded scenes.

Findings

01

Effective detection in crowded scenes demonstrated

02

Eliminates need for non-maximum suppression

03

Uses a recurrent LSTM with set-based loss

Abstract

Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

End-To-End People Detection in Crowded Scenes· youtube

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition