A Hierarchical Graph-based Approach for Recognition and Description Generation of Bimanual Actions in Videos
Fatemeh Ziaeetabar, Reza Safabakhsh, Saeedeh Momtazi, Minija, Tamosiunaite, Florentin W\"org\"otter

TL;DR
This paper presents a hierarchical graph-based method with layered attention mechanisms for improved recognition and detailed description generation of bimanual actions in videos, benefiting robotics and human-computer interaction.
Contribution
It introduces a novel 3-level GAT architecture combining scene graphs and hierarchical attention for enhanced action recognition and description in videos.
Findings
Outperforms state-of-the-art in accuracy, precision, and relevance.
Generates multiple semantic descriptions in parallel.
Provides deeper insights into bimanual hand-object interactions.
Abstract
Nuanced understanding and the generation of detailed descriptive content for (bimanual) manipulation actions in videos is important for disciplines such as robotics, human-computer interaction, and video content analysis. This study describes a novel method, integrating graph based modeling with layered hierarchical attention mechanisms, resulting in higher precision and better comprehensiveness of video descriptions. To achieve this, we encode, first, the spatio-temporal inter dependencies between objects and actions with scene graphs and we combine this, in a second step, with a novel 3-level architecture creating a hierarchical attention mechanism using Graph Attention Networks (GATs). The 3-level GAT architecture allows recognizing local, but also global contextual elements. This way several descriptions with different semantic complexity can be generated in parallel for the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Graph Neural Networks · Artificial Intelligence in Healthcare and Education
MethodsGraph Attention Network
