ContextHOI: Spatial Context Learning for Human-Object Interaction Detection
Mingda Jia, Liming Zhao, Ge Li, Yun Zheng

TL;DR
ContextHOI introduces a dual-branch framework that effectively captures spatial context for improved human-object interaction detection, especially in occluded or blurred scenarios, achieving state-of-the-art results.
Contribution
The paper proposes a novel dual-branch framework with context-aware supervision to enhance spatial context learning in HOI detection without extra background labels.
Findings
Achieves state-of-the-art performance on HICO-DET and v-coco benchmarks.
Excels in recognizing interactions with occluded or blurred instances.
Introduces the HICO-ambiguous benchmark for challenging HOI evaluation.
Abstract
Spatial contexts, such as the backgrounds and surroundings, are considered critical in Human-Object Interaction (HOI) recognition, especially when the instance-centric foreground is blurred or occluded. Recent advancements in HOI detectors are usually built upon detection transformer pipelines. While such an object-detection-oriented paradigm shows promise in localizing objects, its exploration of spatial context is often insufficient for accurately recognizing human actions. To enhance the capabilities of object detectors for HOI detection, we present a dual-branch framework named ContextHOI, which efficiently captures both object detection features and spatial contexts. In the context branch, we train the model to extract informative spatial context without requiring additional hand-craft background labels. Furthermore, we introduce context-aware spatial and semantic supervision to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Geographic Information Systems Studies · Context-Aware Activity Recognition Systems
