Geometric Features Enhanced Human-Object Interaction Detection
Manli Zhu, Edmond S. L. Ho, Shuang Chen, Longzhi Yang, Hubert P. H., Shum

TL;DR
This paper introduces GeoHOI, a novel Transformer-based human-object interaction detection model that leverages geometric features and a self-supervised keypoint learning method to improve performance, especially under occlusion.
Contribution
The paper proposes GeoHOI, an end-to-end Transformer model enhanced with geometric features and UniPointNet for consistent keypoint representation across categories, improving HOI detection accuracy.
Findings
Outperforms state-of-the-art on V-COCO
Achieves competitive results on HICO-DET
Demonstrates effectiveness in post-disaster rescue scenarios
Abstract
Cameras are essential vision instruments to capture images for pattern detection and measurement. Human-object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and thus promising results. However, most of them follow the one-stage design of vanilla Transformer, leaving rich geometric priors under-exploited and leading to compromised performance especially when occlusion occurs. Given that geometric features tend to outperform visual ones in occluded scenarios and offer information that complements visual cues, we propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI). One key part of the model is a new unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings
