Learning Human-Object Interaction as Groups
Jiajun Hong, Jianan Wei, Wenguan Wang

TL;DR
This paper introduces GroupHOI, a novel framework for human-object interaction detection that models interactions from a group perspective by leveraging geometric proximity and semantic similarity, improving performance on multiple benchmarks.
Contribution
The paper proposes a group-based relation modeling approach for HOI detection, incorporating learnable proximity clustering and enhanced transformer decoders for better context aggregation.
Findings
Outperforms state-of-the-art methods on HICO-DET and V-COCO benchmarks.
Achieves leading results on the challenging NVI-DET task.
Effectively models higher-order interactions within groups.
Abstract
Human-Object Interaction Detection (HOI-DET) aims to localize human-object pairs and identify their interactive relationships. To aggregate contextual cues, existing methods typically propagate information across all detected entities via self-attention mechanisms, or establish message passing between humans and objects with bipartite graphs. However, they primarily focus on pairwise relationships, overlooking that interactions in real-world scenarios often emerge from collective behaviors (multiple humans and objects engaging in joint activities). In light of this, we revisit relation modeling from a group view and propose GroupHOI, a framework that propagates contextual information in terms of geometric proximity and semantic similarity. To exploit the geometric proximity, humans and objects are grouped into distinct clusters using a learnable proximity estimator based on spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Social Robot Interaction and HRI
