Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting
David Latortue, Moetez Kdayem, Fidel A Guerrero Pe\~na, Eric Granger,, Marco Pedersoli

TL;DR
This paper investigates how different supervision levels impact infrared-based people counting, showing that CNN image-level models can match the performance of more complex detectors while offering higher efficiency.
Contribution
It introduces an analysis of supervision trade-offs in infrared people counting, demonstrating that simpler CNN models can achieve competitive accuracy with improved speed.
Findings
CNN image-level models perform comparably to YOLO detectors
Weaker supervision levels can still yield effective counting results
Higher frame rates are achievable with simpler models
Abstract
Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly bounding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision can affect the performance of deep person counting architectures for image classification and point-level localization. Our experiments indicate that counting people using a CNN Image-Level model achieves competitive results with YOLO detectors and point-level models, yet provides a higher frame rate and a similar amount of model parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition
