MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization
Zhenying Fang, Richang Hong

TL;DR
MGCA-Net introduces a multi-grained, category-aware approach for open-vocabulary temporal action localization, significantly improving recognition accuracy for both base and novel categories in videos.
Contribution
The paper proposes MGCA-Net, a novel network with coarse-to-fine classification for better open-vocabulary action localization, addressing single-granularity limitations of prior methods.
Findings
Achieves state-of-the-art results on THUMOS'14 and ActivityNet-1.3.
Excels in zero-shot temporal action localization scenarios.
Enhances localization accuracy through multi-grained category awareness.
Abstract
Open-Vocabulary Temporal Action Localization (OV-TAL) aims to recognize and localize instances of any desired action categories in videos without explicitly curating training data for all categories. Existing methods mostly recognize action categories at a single granularity, which degrades the recognition accuracy of both base and novel action categories. To address these issues, we propose a Multi-Grained Category-Aware Network (MGCA-Net) comprising a localizer, an action presence predictor, a conventional classifier, and a coarse-to-fine classifier. Specifically, the localizer localizes category-agnostic action proposals. For these action proposals, the action presence predictor estimates the probability that they belong to an action instance. At the same time, the conventional classifier predicts the probability of each action proposal over base action categories at the snippet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Action Observation and Synchronization
