Interactiveness Field in Human-Object Interactions
Xinpeng Liu, Yong-Lu Li, Xiaoqian Wu, Yu-Wing Tai, Cewu Lu, Chi-Keung, Tang

TL;DR
This paper introduces the interactiveness field, a novel prior that improves human-object interaction detection by effectively distinguishing interactive pairs from non-interactive ones, leading to more accurate detection results.
Contribution
The work proposes the interactiveness field and associated energy constraints, addressing the challenge of extracting truly interactive human-object pairs in HOI detection.
Findings
Significant performance improvement over state-of-the-art methods.
Effective detection of interactive human-object pairs.
Validated on widely-used benchmarks.
Abstract
Human-Object Interaction (HOI) detection plays a core role in activity understanding. Though recent two/one-stage methods have achieved impressive results, as an essential step, discovering interactive human-object pairs remains challenging. Both one/two-stage methods fail to effectively extract interactive pairs instead of generating redundant negative pairs. In this work, we introduce a previously overlooked interactiveness bimodal prior: given an object in an image, after pairing it with the humans, the generated pairs are either mostly non-interactive, or mostly interactive, with the former more frequent than the latter. Based on this interactiveness bimodal prior we propose the "interactiveness field". To make the learned field compatible with real HOI image considerations, we propose new energy constraints based on the cardinality and difference in the inherent "interactiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Visual Attention and Saliency Detection
