HOKEM: Human and Object Keypoint-based Extension Module for Human-Object   Interaction Detection

Yoshiki Ito

arXiv:2306.14260·cs.CV·June 27, 2023

HOKEM: Human and Object Keypoint-based Extension Module for Human-Object Interaction Detection

Yoshiki Ito

PDF

Open Access

TL;DR

HOKEM enhances human-object interaction detection by introducing a novel keypoint extraction method and an adaptive GCN, significantly improving accuracy in HOI detection tasks.

Contribution

The paper proposes a new extension module with an object keypoint extraction method and an adaptive GCN for better HOI detection accuracy.

Findings

01

Boosted HOI detection accuracy on V-COCO dataset

02

Effective object shape representation across various objects

03

Improved spatial relationship modeling between keypoints

Abstract

Human-object interaction (HOI) detection for capturing relationships between humans and objects is an important task in the semantic understanding of images. When processing human and object keypoints extracted from an image using a graph convolutional network (GCN) to detect HOI, it is crucial to extract appropriate object keypoints regardless of the object type and to design a GCN that accurately captures the spatial relationships between keypoints. This paper presents the human and object keypoint-based extension module (HOKEM) as an easy-to-use extension module to improve the accuracy of the conventional detection models. The proposed object keypoint extraction method is simple yet accurately represents the shapes of various objects. Moreover, the proposed human-object adaptive GCN (HO-AGCN), which introduces adaptive graph optimization and attention mechanism, accurately captures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

MethodsGraph Convolutional Network