TL;DR
This paper introduces Crowd-SDNet, a self-training method that enables point-supervised object detection and counting in crowded scenes by estimating object centers and sizes, significantly improving accuracy over existing methods.
Contribution
The paper presents a novel self-training framework that estimates object sizes from point annotations and refines them iteratively, enhancing detection and counting in crowded scenes.
Findings
Outperforms state-of-the-art point-supervised methods in detection and counting.
Improves average precision by over 10% on WiderFace.
Reduces counting error by 31.2% on benchmark datasets.
Abstract
In this paper, we propose a novel self-training approach named Crowd-SDNet that enables a typical object detector trained only with point-level annotations (i.e., objects are labeled with points) to estimate both the center points and sizes of crowded objects. Specifically, during training, we utilize the available point annotations to supervise the estimation of the center points of objects directly. Based on a locally-uniform distribution assumption, we initialize pseudo object sizes from the point-level supervisory information, which are then leveraged to guide the regression of object sizes via a crowdedness-aware loss. Meanwhile, we propose a confidence and order-aware refinement scheme to continuously refine the initial pseudo object sizes such that the ability of the detector is increasingly boosted to detect and count objects in crowds simultaneously. Moreover, to address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
