Zero-Shot Human-Object Interaction Recognition via Affordance Graphs
Alessio Sarullo, Tingting Mu

TL;DR
This paper introduces a zero-shot human-object interaction recognition method leveraging affordance graphs to model action-object relations, enabling recognition of unseen interactions with improved accuracy.
Contribution
It presents a novel graph-based approach that incorporates external knowledge into zero-shot recognition, outperforming existing methods on standard datasets.
Findings
Outperforms current state-of-the-art on HICO and HICO-DET datasets
Effectively models affordance relations to recognize unseen interactions
Uses a new loss function to distill and regularize knowledge in the model
Abstract
We propose a new approach for Zero-Shot Human-Object Interaction Recognition in the challenging setting that involves interactions with unseen actions (as opposed to just unseen combinations of seen actions and objects). Our approach makes use of knowledge external to the image content in the form of a graph that models affordance relations between actions and objects, i.e., whether an action can be performed on the given object or not. We propose a loss function with the aim of distilling the knowledge contained in the graph into the model, while also using the graph to regularise learnt representations by imposing a local structure on the latent space. We evaluate our approach on several datasets (including the popular HICO and HICO-DET) and show that it outperforms the current state of the art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
