AffordanceNet: An End-to-End Deep Learning Approach for Object   Affordance Detection

Thanh-Toan Do; Anh Nguyen; Ian Reid

arXiv:1709.07326·cs.CV·March 6, 2018·27 cites

AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection

Thanh-Toan Do, Anh Nguyen, Ian Reid

PDF

Open Access 2 Repos

TL;DR

AffordanceNet is an end-to-end deep learning framework that detects objects and their pixel-wise affordances in RGB images, enabling real-time robotic applications with high accuracy.

Contribution

It introduces a novel multi-task deep learning architecture with specific components for joint object and affordance detection, outperforming existing methods.

Findings

01

Outperforms state-of-the-art methods on public datasets

02

Operates at 150ms per image for real-time use

03

Effective in diverse testing environments and robotic applications

Abstract

We propose AffordanceNet, a new deep learning approach to simultaneously detect multiple objects and their affordances from RGB images. Our AffordanceNet has two branches: an object detection branch to localize and classify the object, and an affordance detection branch to assign each pixel in the object to its most probable affordance label. The proposed framework employs three key components for effectively handling the multiclass problem in the affordance mask: a sequence of deconvolutional layers, a robust resizing strategy, and a multi-task loss function. The experimental results on the public datasets show that our AffordanceNet outperforms recent state-of-the-art methods by a fair margin, while its end-to-end architecture allows the inference at the speed of 150ms per image. This makes our AffordanceNet well suitable for real-time robotic applications. Furthermore, we demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Advanced Neural Network Applications · Industrial Vision Systems and Defect Detection

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings