HOKEM: Human and Object Keypoint-based Extension Module for Human-Object Interaction Detection
Yoshiki Ito

TL;DR
HOKEM enhances human-object interaction detection by introducing a novel keypoint extraction method and an adaptive GCN, significantly improving accuracy in HOI detection tasks.
Contribution
The paper proposes a new extension module with an object keypoint extraction method and an adaptive GCN for better HOI detection accuracy.
Findings
Boosted HOI detection accuracy on V-COCO dataset
Effective object shape representation across various objects
Improved spatial relationship modeling between keypoints
Abstract
Human-object interaction (HOI) detection for capturing relationships between humans and objects is an important task in the semantic understanding of images. When processing human and object keypoints extracted from an image using a graph convolutional network (GCN) to detect HOI, it is crucial to extract appropriate object keypoints regardless of the object type and to design a GCN that accurately captures the spatial relationships between keypoints. This paper presents the human and object keypoint-based extension module (HOKEM) as an easy-to-use extension module to improve the accuracy of the conventional detection models. The proposed object keypoint extraction method is simple yet accurately represents the shapes of various objects. Moreover, the proposed human-object adaptive GCN (HO-AGCN), which introduces adaptive graph optimization and attention mechanism, accurately captures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
MethodsGraph Convolutional Network
