TL;DR
This paper introduces a novel monocular video-based 3D object detection method that leverages kinematic motion to enhance localization accuracy and scene understanding in self-driving applications.
Contribution
It presents a new approach that decomposes object orientation and incorporates a self-balancing 3D confidence to utilize motion cues effectively.
Findings
Achieves state-of-the-art results on KITTI dataset for 3D detection.
Effectively leverages scene dynamics like ego-motion and object velocity.
Improves 3D localization precision using monocular video data.
Abstract
Perceiving the physical world in 3D is fundamental for self-driving applications. Although temporal motion is an invaluable resource to human vision for detection, tracking, and depth perception, such features have not been thoroughly utilized in modern 3D object detectors. In this work, we propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization. Specifically, we first propose a novel decomposition of object orientation as well as a self-balancing 3D confidence. We show that both components are critical to enable our kinematic model to work effectively. Collectively, using only a single model, we efficiently leverage 3D kinematics from monocular videos to improve the overall localization precision in 3D object detection while also producing useful by-products of scene dynamics (ego-motion and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
