Effective Actor-centric Human-object Interaction Detection
Kunlun Xu, Zhimin Li, Zhijun Zhang, Leizhen Dong, Wenhui, Xu, Luxin Yan, Sheng Zhong, Xu Zou

TL;DR
This paper introduces an actor-centric framework for human-object interaction detection that leverages non-local features and a novel composition strategy, significantly improving accuracy in complex scenes with multiple humans and objects.
Contribution
The proposed method uniquely combines actor-guided non-local features with a pixel-wise interaction area prediction and a center-point indexing composition strategy, advancing HOI detection performance.
Findings
Achieves state-of-the-art results on V-COCO and HICO-DET benchmarks.
More robust in scenes with multiple persons and objects.
Improves detection accuracy in complex interaction scenarios.
Abstract
While Human-Object Interaction(HOI) Detection has achieved tremendous advances in recent, it still remains challenging due to complex interactions with multiple humans and objects occurring in images, which would inevitably lead to ambiguities. Most existing methods either generate all human-object pair candidates and infer their relationships by cropped local features successively in a two-stage manner, or directly predict interaction points in a one-stage procedure. However, the lack of spatial configurations or reasoning steps of two- or one- stage methods respectively limits their performance in such complex scenes. To avoid this ambiguity, we propose a novel actor-centric framework. The main ideas are that when inferring interactions: 1) the non-local features of the entire image guided by actor position are obtained to model the relationship between the actor and context, and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
