A Learned Stereo Depth System for Robotic Manipulation in Homes

Krishna Shankar; Mark Tjersland; Jeremy Ma; Kevin Stone; Max; Bajracharya

arXiv:2109.11644·cs.RO·September 27, 2021

A Learned Stereo Depth System for Robotic Manipulation in Homes

Krishna Shankar, Mark Tjersland, Jeremy Ma, Kevin Stone, Max, Bajracharya

PDF

TL;DR

This paper introduces a fast, dense stereo depth system optimized for human environments, capable of handling challenging surfaces and objects at high resolution, suitable for robotic manipulation tasks.

Contribution

The authors develop a novel learned stereo matching algorithm combined with engineered filtering, achieving 15x faster processing than similar benchmark approaches.

Findings

01

Achieves 30 ms processing time at 2560x2048 resolution

02

Performs well on challenging surfaces like reflective and textureless objects

03

Outperforms comparable models on Middlebury and FlyingThings benchmarks

Abstract

We present a passive stereo depth system that produces dense and accurate point clouds optimized for human environments, including dark, textureless, thin, reflective and specular surfaces and objects, at 2560x2048 resolution, with 384 disparities, in 30 ms. The system consists of an algorithm combining learned stereo matching with engineered filtering, a training and data-mixing methodology, and a sensor hardware design. Our architecture is 15x faster than approaches that perform similarly on the Middlebury and Flying Things Stereo Benchmarks. To effectively supervise the training of this model, we combine real data labelled using off-the-shelf depth sensors, as well as a number of different rendered, simulated labeled datasets. We demonstrate the efficacy of our system by presenting a large number of qualitative results in the form of depth maps and point-clouds, experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.