Multiview Detection with Cardboard Human Modeling
Jiahao Ma, Zicheng Duan, Liang Zheng, Chuong Nguyen

TL;DR
This paper introduces a novel multiview pedestrian detection method that models humans as cardboard point clouds, improving robustness by explicitly capturing appearance and reducing projection errors, leading to competitive results.
Contribution
It proposes a new human representation scheme using cardboard point clouds derived from holistic depth estimation, enhancing multiview detection accuracy.
Findings
Achieves competitive results on four benchmarks.
Reduces projection errors compared to existing methods.
Explicitly leverages human appearance for better detection.
Abstract
Multiview detection uses multiple calibrated cameras with overlapping fields of views to locate occluded pedestrians. In this field, existing methods typically adopt a ``human modeling - aggregation'' strategy. To find robust pedestrian representations, some intuitively incorporate 2D perception results from each frame, while others use entire frame features projected to the ground plane. However, the former does not consider the human appearance and leads to many ambiguities, and the latter suffers from projection errors due to the lack of accurate height of the human torso and head. In this paper, we propose a new pedestrian representation scheme based on human point clouds modeling. Specifically, using ray tracing for holistic human depth estimation, we model pedestrians as upright, thin cardboard point clouds on the ground. Then, we aggregate the point clouds of the pedestrian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Vision and Imaging
