TL;DR
This paper addresses the challenge of detecting and tracking occluded, invisible people using monocular vision, introducing new metrics and models that incorporate 3D reasoning and depth estimation to improve performance.
Contribution
It proposes the first approach to utilize monocular depth estimation for detecting occluded objects and introduces new metrics for evaluating invisible object detection in tracking.
Findings
Performance drops significantly on invisible object detection tasks.
The proposed models improve F1 score by 5.0% over the state-of-the-art.
Treating occlusion as a short-term forecasting problem enhances detection accuracy.
Abstract
Monocular object detection and tracking have improved drastically in recent years, but rely on a key assumption: that objects are visible to the camera. Many offline tracking approaches reason about occluded objects post-hoc, by linking together tracklets after the object re-appears, making use of reidentification (ReID). However, online tracking in embodied robotic agents (such as a self-driving vehicle) fundamentally requires object permanence, which is the ability to reason about occluded objects before they re-appear. In this work, we re-purpose tracking benchmarks and propose new metrics for the task of detecting invisible objects, focusing on the illustrative case of people. We demonstrate that current detection and tracking systems perform dramatically worse on this task. We introduce two key innovations to recover much of this performance drop. We treat occluded object detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
