Telling Stories for Common Sense Zero-Shot Action Recognition
Shreyank N Gowda, Laura Sevilla-Lara

TL;DR
This paper introduces a new dataset called Stories with detailed textual narratives for action classes, enabling improved zero-shot video action recognition without fine-tuning, and achieves state-of-the-art results on benchmarks.
Contribution
The paper presents the Stories dataset with rich textual descriptions and a novel method leveraging this data to enhance zero-shot action recognition performance.
Findings
Achieved up to 6.1% improvement in top-1 accuracy on benchmarks.
Introduced a new dataset with multi-sentence narratives for action classes.
Demonstrated the effectiveness of textual context in zero-shot transfer.
Abstract
Video understanding has long suffered from reliance on large labeled datasets, motivating research into zero-shot learning. Recent progress in language modeling presents opportunities to advance zero-shot video analysis, but constructing an effective semantic space relating action classes remains challenging. We address this by introducing a novel dataset, Stories, which contains rich textual descriptions for diverse action classes extracted from WikiHow articles. For each class, we extract multi-sentence narratives detailing the necessary steps, scenes, objects, and verbs that characterize the action. This contextual data enables modeling of nuanced relationships between actions, paving the way for zero-shot transfer. We also propose an approach that harnesses Stories to improve feature generation for training zero-shot classification. Without any target dataset fine-tuning, our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
