AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection
Thanh-Toan Do, Anh Nguyen, Ian Reid

TL;DR
AffordanceNet is an end-to-end deep learning framework that detects objects and their pixel-wise affordances in RGB images, enabling real-time robotic applications with high accuracy.
Contribution
It introduces a novel multi-task deep learning architecture with specific components for joint object and affordance detection, outperforming existing methods.
Findings
Outperforms state-of-the-art methods on public datasets
Operates at 150ms per image for real-time use
Effective in diverse testing environments and robotic applications
Abstract
We propose AffordanceNet, a new deep learning approach to simultaneously detect multiple objects and their affordances from RGB images. Our AffordanceNet has two branches: an object detection branch to localize and classify the object, and an affordance detection branch to assign each pixel in the object to its most probable affordance label. The proposed framework employs three key components for effectively handling the multiclass problem in the affordance mask: a sequence of deconvolutional layers, a robust resizing strategy, and a multi-task loss function. The experimental results on the public datasets show that our AffordanceNet outperforms recent state-of-the-art methods by a fair margin, while its end-to-end architecture allows the inference at the speed of 150ms per image. This makes our AffordanceNet well suitable for real-time robotic applications. Furthermore, we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Advanced Neural Network Applications · Industrial Vision Systems and Defect Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
