Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
Manuel Benavent-Lledo, David Mulero-P\'erez, David Ortiz-Perez, Jose Garcia-Rodriguez, Antonis Argyros

TL;DR
This paper introduces a transformer-based approach that leverages hierarchical action structures and textual context to significantly improve action recognition accuracy in complex, real-world datasets.
Contribution
It presents a novel hierarchical action recognition model that combines visual and textual features with a joint loss function, and extends datasets with hierarchical annotations.
Findings
Achieves over 17% improvement in top-1 accuracy on multiple datasets.
Demonstrates the effectiveness of hierarchical and contextual information integration.
Outperforms state-of-the-art methods consistently.
Abstract
We propose a novel approach to improve action recognition by exploiting the hierarchical organization of actions and by incorporating contextualized textual information, including location and previous actions, to reflect the action's temporal context. To achieve this, we introduce a transformer architecture tailored for action recognition that employs both visual and textual features. Visual features are obtained from RGB and optical flow data, while text embeddings represent contextual information. Furthermore, we define a joint loss function to simultaneously train the model for both coarse- and fine-grained action recognition, effectively exploiting the hierarchical nature of actions. To demonstrate the effectiveness of our method, we extend the Toyota Smarthome Untrimmed (TSU) dataset by incorporating action hierarchies, resulting in the Hierarchical TSU dataset, a hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
