GID-Net: Detecting Human-Object Interaction with Global and Instance   Dependency

Dongming Yang; YueXian Zou; Jian Zhang; Ge Li

arXiv:2003.05242·cs.CV·March 12, 2020·1 cites

GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency

Dongming Yang, YueXian Zou, Jian Zhang, Ge Li

PDF

Open Access

TL;DR

GID-Net introduces a novel two-stage reasoning mechanism with GID blocks to effectively capture long-range dependencies for improved human-object interaction detection in images.

Contribution

The paper proposes GID-Net, a multi-stream network with GID blocks that model global and instance-level dependencies, advancing HOI detection accuracy.

Findings

01

Outperforms state-of-the-art on V-COCO and HICO-DET benchmarks.

02

Effectively captures long-range pixel dependencies.

03

Demonstrates improved interaction detection performance.

Abstract

Since detecting and recognizing individual human or object are not adequate to understand the visual world, learning how humans interact with surrounding objects becomes a core technology. However, convolution operations are weak in depicting visual interactions between the instances since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human perception in observing HOIs to introduce a two-stage trainable reasoning mechanism, referred to as GID block. GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances. Furthermore, we conduct a multi-stream network called GID-Net, which is a human-object interaction detection framework consisting of a human branch, an object branch and an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Visual Attention and Saliency Detection

MethodsConvolution