Video Individual Counting With Implicit One-to-Many Matching
Xuhui Zhu, Jing Xu, Bingjie Wang, Huikang Dai, Hao Lu

TL;DR
This paper introduces OMAN, a novel VIC model that employs implicit one-to-many matching to better identify pedestrians across video frames, leveraging social behaviors for improved counting accuracy.
Contribution
The paper proposes a new VIC approach using implicit one-to-many matching, addressing limitations of previous one-to-one strategies and achieving state-of-the-art results.
Findings
OMAN outperforms existing VIC methods on benchmarks.
Implicit one-to-many matching improves pedestrian correspondence accuracy.
The model effectively leverages social grouping behavior.
Abstract
Video Individual Counting (VIC) is a recently introduced task that aims to estimate pedestrian flux from a video. It extends conventional Video Crowd Counting (VCC) beyond the per-frame pedestrian count. In contrast to VCC that only learns to count repeated pedestrian patterns across frames, the key problem of VIC is how to identify co-existent pedestrians between frames, which turns out to be a correspondence problem. Existing VIC approaches, however, mainly follow a one-to-one (O2O) matching strategy where the same pedestrian must be exactly matched between frames, leading to sensitivity to appearance variations or missing detections. In this work, we show that the O2O matching could be relaxed to a one-to-many (O2M) matching problem, which better fits the problem nature of VIC and can leverage the social grouping behavior of walking pedestrians. We therefore introduce OMAN, a simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Video Analysis and Summarization · Video Coding and Compression Technologies
