Knowledge Guided Bidirectional Attention Network for Human-Object   Interaction Detection

Jingjia Huang; Baixiang Yang

arXiv:2207.07979·cs.CV·July 19, 2022

Knowledge Guided Bidirectional Attention Network for Human-Object Interaction Detection

Jingjia Huang, Baixiang Yang

PDF

Open Access

TL;DR

This paper introduces a novel knowledge-guided top-down attention mechanism for human-object interaction detection, combining it with bottom-up attention in a unified model to improve discrimination accuracy.

Contribution

It proposes a new top-down attention approach guided by scene knowledge, integrated with bottom-up attention in a single encoder-decoder model for HOI detection.

Findings

01

Achieves competitive results on V-COCO dataset

02

Outperforms existing methods on HICO-DET dataset

03

Demonstrates the effectiveness of combined top-down and bottom-up attention

Abstract

Human Object Interaction (HOI) detection is a challenging task that requires to distinguish the interaction between a human-object pair. Attention based relation parsing is a popular and effective strategy utilized in HOI. However, current methods execute relation parsing in a "bottom-up" manner. We argue that the independent use of the bottom-up parsing strategy in HOI is counter-intuitive and could lead to the diffusion of attention. Therefore, we introduce a novel knowledge-guided top-down attention into HOI, and propose to model the relation parsing as a "look and search" process: execute scene-context modeling (i.e. look), and then, given the knowledge of the target pair, search visual clues for the discrimination of the interaction between the pair. We implement the process via unifying the bottom-up and top-down attention in a single encoder-decoder based model. The experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Human Pose and Action Recognition

MethodsDiffusion