Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions
Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma,, Xiaoran Yan

TL;DR
This paper introduces a hierarchical video recognition framework that encodes category dependencies and applies top-down constraints, significantly improving recognition accuracy especially for fine-grained subcategories, and provides a new medical dataset for benchmarking.
Contribution
The paper formalizes the task of hierarchical video recognition, proposes a novel video-language framework with hierarchical encoding and constraints, and introduces a new medical dataset for evaluation.
Findings
Outperforms conventional flat classification methods
Effective in recognizing fine-grained subcategories
Provides a new challenging benchmark dataset
Abstract
Video recognition remains an open challenge, requiring the identification of diverse content categories within videos. Mainstream approaches often perform flat classification, overlooking the intrinsic hierarchical structure relating categories. To address this, we formalize the novel task of hierarchical video recognition, and propose a video-language learning framework tailored for hierarchical recognition. Specifically, our framework encodes dependencies between hierarchical category levels, and applies a top-down constraint to filter recognition predictions. We further construct a new fine-grained dataset based on medical assessments for rehabilitation of stroke patients, serving as a challenging benchmark for hierarchical recognition. Through extensive experiments, we demonstrate the efficacy of our approach for hierarchical recognition, significantly outperforming conventional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Anomaly Detection Techniques and Applications
