Learning Human-Object Interactions by Graph Parsing Neural Networks
Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu

TL;DR
This paper introduces GPNN, a neural network framework that models human-object interactions in images and videos by inferring parse graphs, significantly improving detection accuracy over existing methods.
Contribution
The paper presents GPNN, a novel end-to-end differentiable neural network that incorporates structural knowledge through graph parsing for HOI detection.
Findings
Outperforms state-of-the-art on HICO-DET, V-COCO, and CAD-120 datasets.
Scalable to large datasets and applicable to spatial-temporal data.
Effectively infers parse graphs with adjacency matrices and node labels.
Abstract
This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings. The code is available at https://github.com/SiyuanQi/gpnn.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
