Leveraging Self-Supervision for Cross-Domain Crowd Counting
Weizhe Liu, Nikita Durasov, Pascal Fua

TL;DR
This paper introduces a novel cross-domain crowd counting method that combines synthetic and unlabeled real images using self-supervision, improving generalization without extra inference costs.
Contribution
It proposes a self-supervised training approach that leverages unlabeled real images and synthetic data, enhancing cross-domain crowd counting performance.
Findings
Outperforms state-of-the-art cross-domain crowd counting methods
Uses perspective-aware feature learning and uncertainty prediction
No additional inference computation required
Abstract
State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density. While effective, these data-driven approaches rely on large amount of data annotation to achieve good performance, which stops these models from being deployed in emergencies during which data annotation is either too costly or cannot be obtained fast enough. One popular solution is to use synthetic data for training. Unfortunately, due to domain shift, the resulting models generalize poorly on real imagery. We remedy this shortcoming by training with both synthetic images, along with their associated labels, and unlabeled real images. To this end, we force our network to learn perspective-aware features by training it to recognize upside-down real images from regular ones and incorporate into it the ability to predict its own uncertainty so that it can generate useful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
