TL;DR
This paper presents a novel deep learning approach to estimate object depth from video segmentation masks and camera motion, enabling 3D understanding in robotics and autonomous driving without requiring calibrated cameras.
Contribution
It introduces a new dataset and a deep network that estimates object depth from uncalibrated camera motion and segmentation masks, robust to segmentation errors.
Findings
Effective depth estimation from uncalibrated camera motion and segmentation masks
Generalizes across domains including robotics and autonomous driving
Robust to segmentation errors in depth prediction
Abstract
Video object segmentation, i.e., the separation of a target object from background in video, has made significant progress on real and challenging videos in recent years. To leverage this progress in 3D applications, this paper addresses the problem of learning to estimate the depth of segmented objects given some measurement of camera motion (e.g., from robot kinematics or vehicle odometry). We achieve this by, first, introducing a diverse, extensible dataset and, second, designing a novel deep network that estimates the depth of objects using only segmentation masks and uncalibrated camera movement. Our data-generation framework creates artificial object segmentations that are scaled for changes in distance between the camera and object, and our network learns to estimate object depth even with segmentation errors. We demonstrate our approach across domains using a robot camera to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
