Effective Actor-centric Human-object Interaction Detection

Kunlun Xu; Zhimin Li; Zhijun Zhang; Leizhen Dong; Wenhui; Xu; Luxin Yan; Sheng Zhong; Xu Zou

arXiv:2202.11998·cs.CV·April 4, 2022

Effective Actor-centric Human-object Interaction Detection

Kunlun Xu, Zhimin Li, Zhijun Zhang, Leizhen Dong, Wenhui, Xu, Luxin Yan, Sheng Zhong, Xu Zou

PDF

TL;DR

This paper introduces an actor-centric framework for human-object interaction detection that leverages non-local features and a novel composition strategy, significantly improving accuracy in complex scenes with multiple humans and objects.

Contribution

The proposed method uniquely combines actor-guided non-local features with a pixel-wise interaction area prediction and a center-point indexing composition strategy, advancing HOI detection performance.

Findings

01

Achieves state-of-the-art results on V-COCO and HICO-DET benchmarks.

02

More robust in scenes with multiple persons and objects.

03

Improves detection accuracy in complex interaction scenarios.

Abstract

While Human-Object Interaction(HOI) Detection has achieved tremendous advances in recent, it still remains challenging due to complex interactions with multiple humans and objects occurring in images, which would inevitably lead to ambiguities. Most existing methods either generate all human-object pair candidates and infer their relationships by cropped local features successively in a two-stage manner, or directly predict interaction points in a one-stage procedure. However, the lack of spatial configurations or reasoning steps of two- or one- stage methods respectively limits their performance in such complex scenes. To avoid this ambiguity, we propose a novel actor-centric framework. The main ideas are that when inferring interactions: 1) the non-local features of the entire image guided by actor position are obtained to model the relationship between the actor and context, and then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.