HODN: Disentangling Human-Object Feature for HOI Detection
Shuman Fang, Zhiwen Lin, Ke Yan, Jie Li, Xianming Lin, Rongrong Ji

TL;DR
This paper introduces HODN, a novel network that explicitly models human-object interactions by disentangling human and object features, improving detection accuracy by focusing on human-centric regions and controlling interaction influences.
Contribution
The paper proposes a Human and Object Disentangling Network (HODN) with a Human-Guide Linking and Stop-Gradient Mechanism to enhance HOI detection by explicitly modeling relationships.
Findings
Achieves competitive results on V-COCO and HICO-Det datasets.
Effectively disentangles human and object features for better interaction modeling.
Can be integrated with existing methods for improved performance.
Abstract
The task of Human-Object Interaction (HOI) detection is to detect humans and their interactions with surrounding objects, where transformer-based methods show dominant advances currently. However, these methods ignore the relationship among humans, objects, and interactions: 1) human features are more contributive than object ones to interaction prediction; 2) interactive information disturbs the detection of objects but helps human detection. In this paper, we propose a Human and Object Disentangling Network (HODN) to model the HOI relationships explicitly, where humans and objects are first detected by two disentangling decoders independently and then processed by an interaction decoder. Considering that human features are more contributive to interaction, we propose a Human-Guide Linking method to make sure the interaction decoder focuses on the human-centric regions with human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
