From Category to Scenery: An End-to-End Framework for Multi-Person   Human-Object Interaction Recognition in Videos

Tanqiu Qiao; Ruochen Li; Frederick W. B. Li; Hubert P. H. Shum

arXiv:2407.00917·cs.CV·July 24, 2024

From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos

Tanqiu Qiao, Ruochen Li, Frederick W. B. Li, Hubert P. H. Shum

PDF

Open Access

TL;DR

This paper introduces CATS, an end-to-end framework that combines geometric and visual features in a graph-based model to improve multi-person human-object interaction recognition in videos, achieving state-of-the-art results.

Contribution

The novel CATS framework effectively integrates geometric and visual features through graph modeling, advancing the understanding of complex human-object interactions in videos.

Findings

01

Achieves state-of-the-art performance on MPHOI-72 and CAD-120 datasets.

02

Effectively models dynamic relationships between humans and objects.

03

Bridges category-specific insights with scenery dynamics.

Abstract

Video-based Human-Object Interaction (HOI) recognition explores the intricate dynamics between humans and objects, which are essential for a comprehensive understanding of human behavior and intentions. While previous work has made significant strides, effectively integrating geometric and visual features to model dynamic relationships between humans and objects in a graph framework remains a challenge. In this work, we propose a novel end-to-end category to scenery framework, CATS, starting by generating geometric features for various categories through graphs respectively, then fusing them with corresponding visual features. Subsequently, we construct a scenery interactive graph with these enhanced geometric-visual features as nodes to learn the relationships among human and object categories. This methodological advance facilitates a deeper, more structured comprehension of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications