A Single Multi-Task Deep Neural Network with Post-Processing for Object   Detection with Reasoning and Robotic Grasp Detection

Dongwon Park; Yonghyeok Seo; Dongju Shin; Jaesik Choi; Se Young Chun

arXiv:1909.07050·cs.CV·September 17, 2019

A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection

Dongwon Park, Yonghyeok Seo, Dongju Shin, Jaesik Choi, Se Young Chun

PDF

TL;DR

This paper introduces a single multi-task deep neural network that simultaneously performs object detection, robotic grasp detection, and reasoning, achieving state-of-the-art accuracy and real-time performance in cluttered environments.

Contribution

The authors propose a unified multi-task DNN with post-processing for object detection, grasp detection, and reasoning, improving efficiency and accuracy over separate networks.

Findings

01

Achieved 98.6% accuracy on VMRD dataset

02

Attained 74.2% accuracy on Cornell dataset

03

Real-time processing at 33-62 FPS

Abstract

Recently, robotic grasp detection (GD) and object detection (OD) with reasoning have been investigated using deep neural networks (DNNs). There have been works to combine these multi-tasks using separate networks so that robots can deal with situations of grasping specific target objects in the cluttered, stacked, complex piles of novel objects from a single RGB-D camera. We propose a single multi-task DNN that yields the information on GD, OD and relationship reasoning among objects with a simple post-processing. Our proposed methods yielded state-of-the-art performance with the accuracy of 98.6% and 74.2% and the computation speed of 33 and 62 frame per second on VMRD and Cornell datasets, respectively. Our methods also yielded 95.3% grasp success rate for single novel object grasping with a 4-axis robot arm and 86.7% grasp success rate in cluttered novel objects with a Baxter robot.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings