Knowledge Guided Bidirectional Attention Network for Human-Object Interaction Detection
Jingjia Huang, Baixiang Yang

TL;DR
This paper introduces a novel knowledge-guided top-down attention mechanism for human-object interaction detection, combining it with bottom-up attention in a unified model to improve discrimination accuracy.
Contribution
It proposes a new top-down attention approach guided by scene knowledge, integrated with bottom-up attention in a single encoder-decoder model for HOI detection.
Findings
Achieves competitive results on V-COCO dataset
Outperforms existing methods on HICO-DET dataset
Demonstrates the effectiveness of combined top-down and bottom-up attention
Abstract
Human Object Interaction (HOI) detection is a challenging task that requires to distinguish the interaction between a human-object pair. Attention based relation parsing is a popular and effective strategy utilized in HOI. However, current methods execute relation parsing in a "bottom-up" manner. We argue that the independent use of the bottom-up parsing strategy in HOI is counter-intuitive and could lead to the diffusion of attention. Therefore, we introduce a novel knowledge-guided top-down attention into HOI, and propose to model the relation parsing as a "look and search" process: execute scene-context modeling (i.e. look), and then, given the knowledge of the target pair, search visual clues for the discrimination of the interaction between the pair. We implement the process via unifying the bottom-up and top-down attention in a single encoder-decoder based model. The experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Human Pose and Action Recognition
MethodsDiffusion
