Mining the Benefits of Two-stage and One-stage HOI Detection
Aixi Zhang, Yue Liao, Si Liu, Miao Lu, Yongliang Wang, Chen Gao,, Xiaobo Li

TL;DR
This paper introduces a novel one-stage HOI detection framework that disentangles human-object detection from interaction classification, significantly improving performance over existing methods.
Contribution
The paper proposes a cascade-based one-stage HOI detection framework with disentangled tasks, outperforming existing methods by a large margin.
Findings
Achieved a 9.32% relative mAP gain on HICO-Det.
Designed a human-object pair generator based on a state-of-the-art one-stage detector.
Implemented a cascade decoders approach focusing separately on detection and classification.
Abstract
Two-stage methods have dominated Human-Object Interaction (HOI) detection for several years. Recently, one-stage HOI detection methods have become popular. In this paper, we aim to explore the essential pros and cons of two-stage and one-stage methods. With this as the goal, we find that conventional two-stage methods mainly suffer from positioning positive interactive human-object pairs, while one-stage methods are challenging to make an appropriate trade-off on multi-task learning, i.e., object detection, and interaction classification. Therefore, a core problem is how to take the essence and discard the dregs from the conventional two types of methods. To this end, we propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner. In detail, we first design a human-object pair generator based on a state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Visual Attention and Saliency Detection
