Learning Visual Affordance Grounding from Demonstration Videos

Hongchen Luo; Wei Zhai; Jing Zhang; Yang Cao; Dacheng Tao

arXiv:2108.05675·cs.CV·August 13, 2021

Learning Visual Affordance Grounding from Demonstration Videos

Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

PDF

TL;DR

This paper introduces HAGNet, a novel network that uses hand cues from demonstration videos to improve visual affordance grounding, achieving state-of-the-art results in segmenting interaction regions.

Contribution

It proposes a dual-branch network with hand-aided attention and semantic enhancement to better locate interaction regions by leveraging demonstration videos.

Findings

01

Achieves state-of-the-art results on two datasets.

02

Effectively leverages hand cues to improve segmentation.

03

Outperforms existing appearance-based methods.

Abstract

Visual affordance grounding aims to segment all possible interaction regions between people and objects from an image/video, which is beneficial for many applications, such as robot grasping and action recognition. However, existing methods mainly rely on the appearance feature of the objects to segment each region of the image, which face the following two problems: (i) there are multiple possible regions in an object that people interact with; and (ii) there are multiple possible human interactions in the same object region. To address these problems, we propose a Hand-aided Affordance Grounding Network (HAGNet) that leverages the aided clues provided by the position and action of the hand in demonstration videos to eliminate the multiple possibilities and better locate the interaction regions in the object. Specifically, HAG-Net has a dual-branch structure to process the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory