Zero-Shot Human-Object Interaction Recognition via Affordance Graphs

Alessio Sarullo; Tingting Mu

arXiv:2009.01039·cs.CV·September 3, 2020·1 cites

Zero-Shot Human-Object Interaction Recognition via Affordance Graphs

Alessio Sarullo, Tingting Mu

PDF

Open Access

TL;DR

This paper introduces a zero-shot human-object interaction recognition method leveraging affordance graphs to model action-object relations, enabling recognition of unseen interactions with improved accuracy.

Contribution

It presents a novel graph-based approach that incorporates external knowledge into zero-shot recognition, outperforming existing methods on standard datasets.

Findings

01

Outperforms current state-of-the-art on HICO and HICO-DET datasets

02

Effectively models affordance relations to recognize unseen interactions

03

Uses a new loss function to distill and regularize knowledge in the model

Abstract

We propose a new approach for Zero-Shot Human-Object Interaction Recognition in the challenging setting that involves interactions with unseen actions (as opposed to just unseen combinations of seen actions and objects). Our approach makes use of knowledge external to the image content in the form of a graph that models affordance relations between actions and objects, i.e., whether an action can be performed on the given object or not. We propose a loss function with the aim of distilling the knowledge contained in the graph into the model, while also using the graph to regularise learnt representations by imposing a local structure on the latent space. We evaluate our approach on several datasets (including the popular HICO and HICO-DET) and show that it outperforms the current state of the art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning