Learning to Detect Human-Object Interactions

Yu-Wei Chao; Yunfan Liu; Xieyang Liu; Huayi Zeng; Jia Deng

arXiv:1702.05448·cs.CV·March 2, 2018·26 cites

Learning to Detect Human-Object Interactions

Yu-Wei Chao, Yunfan Liu, Xieyang Liu, Huayi Zeng, Jia Deng

PDF

Open Access

TL;DR

This paper introduces a new benchmark dataset for human-object interaction detection and proposes a novel neural network model that leverages spatial relations to improve detection accuracy in images.

Contribution

The paper presents HICO-DET, a large benchmark dataset, and HO-RCNN, a new neural network model utilizing Interaction Patterns for better HOI detection.

Findings

01

HO-RCNN significantly outperforms baseline methods.

02

Interaction Patterns effectively capture spatial relations.

03

HICO-DET provides a comprehensive benchmark for future research.

Abstract

We study the problem of detecting human-object interactions (HOI) in static images, defined as predicting a human and an object bounding box with an interaction class label that connects them. HOI detection is a fundamental problem in computer vision as it provides semantic information about the interactions among the detected objects. We introduce HICO-DET, a new large benchmark for HOI detection, by augmenting the current HICO classification benchmark with instance annotations. To solve the task, we propose Human-Object Region-based Convolutional Neural Networks (HO-RCNN). At the core of our HO-RCNN is the Interaction Pattern, a novel DNN input that characterizes the spatial relations between two bounding boxes. Experiments on HICO-DET demonstrate that our HO-RCNN, by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition