GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency
Dongming Yang, YueXian Zou, Jian Zhang, Ge Li

TL;DR
GID-Net introduces a novel two-stage reasoning mechanism with GID blocks to effectively capture long-range dependencies for improved human-object interaction detection in images.
Contribution
The paper proposes GID-Net, a multi-stream network with GID blocks that model global and instance-level dependencies, advancing HOI detection accuracy.
Findings
Outperforms state-of-the-art on V-COCO and HICO-DET benchmarks.
Effectively captures long-range pixel dependencies.
Demonstrates improved interaction detection performance.
Abstract
Since detecting and recognizing individual human or object are not adequate to understand the visual world, learning how humans interact with surrounding objects becomes a core technology. However, convolution operations are weak in depicting visual interactions between the instances since they only build blocks that process one local neighborhood at a time. To address this problem, we learn from human perception in observing HOIs to introduce a two-stage trainable reasoning mechanism, referred to as GID block. GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances. Furthermore, we conduct a multi-stream network called GID-Net, which is a human-object interaction detection framework consisting of a human branch, an object branch and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Visual Attention and Saliency Detection
MethodsConvolution
