Telling Stories for Common Sense Zero-Shot Action Recognition

Shreyank N Gowda; Laura Sevilla-Lara

arXiv:2309.17327·cs.CV·October 24, 2024

Telling Stories for Common Sense Zero-Shot Action Recognition

Shreyank N Gowda, Laura Sevilla-Lara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new dataset called Stories with detailed textual narratives for action classes, enabling improved zero-shot video action recognition without fine-tuning, and achieves state-of-the-art results on benchmarks.

Contribution

The paper presents the Stories dataset with rich textual descriptions and a novel method leveraging this data to enhance zero-shot action recognition performance.

Findings

01

Achieved up to 6.1% improvement in top-1 accuracy on benchmarks.

02

Introduced a new dataset with multi-sentence narratives for action classes.

03

Demonstrated the effectiveness of textual context in zero-shot transfer.

Abstract

Video understanding has long suffered from reliance on large labeled datasets, motivating research into zero-shot learning. Recent progress in language modeling presents opportunities to advance zero-shot video analysis, but constructing an effective semantic space relating action classes remains challenging. We address this by introducing a novel dataset, Stories, which contains rich textual descriptions for diverse action classes extracted from WikiHow articles. For each class, we extract multi-sentence narratives detailing the necessary steps, scenes, objects, and verbs that characterize the action. This contextual data enables modeling of nuanced relationships between actions, paving the way for zero-shot transfer. We also propose an approach that harnesses Stories to improve feature generation for training zero-shot classification. Without any target dataset fine-tuning, our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kini5gowda/stories
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning