Diagnosing Rarity in Human-Object Interaction Detection
Mert Kilickaya, Arnold Smeulders

TL;DR
This paper investigates the challenges of detecting human-object interactions in computer vision, focusing on the impact of rarity and interaction signals like occlusion on model performance.
Contribution
It introduces a three-step diagnostic strategy to analyze factors limiting HOI detection, highlighting issues in detection and identification stages.
Findings
Detection and identification are affected by occlusion and relative location.
Rarity significantly impacts recognition accuracy in HOI detection.
Understanding these factors can guide improvements in model design.
Abstract
Human-object interaction (HOI) detection is a core task in computer vision. The goal is to localize all human-object pairs and recognize their interactions. An interaction defined by a <verb, noun> tuple leads to a long-tailed visual recognition challenge since many combinations are rarely represented. The performance of the proposed models is limited especially for the tail categories, but little has been done to understand the reason. To that end, in this paper, we propose to diagnose rarity in HOI detection. We propose a three-step strategy, namely Detection, Identification and Recognition where we carefully analyse the limiting factors by studying state-of-the-art models. Our findings indicate that detection and identification steps are altered by the interaction signals like occlusion and relative location, as a result limiting the recognition accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning
