A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection
Dongwon Park, Yonghyeok Seo, Dongju Shin, Jaesik Choi, Se Young Chun

TL;DR
This paper introduces a single multi-task deep neural network that simultaneously performs object detection, robotic grasp detection, and reasoning, achieving state-of-the-art accuracy and real-time performance in cluttered environments.
Contribution
The authors propose a unified multi-task DNN with post-processing for object detection, grasp detection, and reasoning, improving efficiency and accuracy over separate networks.
Findings
Achieved 98.6% accuracy on VMRD dataset
Attained 74.2% accuracy on Cornell dataset
Real-time processing at 33-62 FPS
Abstract
Recently, robotic grasp detection (GD) and object detection (OD) with reasoning have been investigated using deep neural networks (DNNs). There have been works to combine these multi-tasks using separate networks so that robots can deal with situations of grasping specific target objects in the cluttered, stacked, complex piles of novel objects from a single RGB-D camera. We propose a single multi-task DNN that yields the information on GD, OD and relationship reasoning among objects with a simple post-processing. Our proposed methods yielded state-of-the-art performance with the accuracy of 98.6% and 74.2% and the computation speed of 33 and 62 frame per second on VMRD and Cornell datasets, respectively. Our methods also yielded 95.3% grasp success rate for single novel object grasping with a 4-axis robot arm and 86.7% grasp success rate in cluttered novel objects with a Baxter robot.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
