Unifying Few- and Zero-Shot Egocentric Action Recognition
Tyler R. Scott, Michael Shvartsman, Karl Ridgeway

TL;DR
This paper introduces a unified approach to egocentric action recognition that handles open-set classes by combining few- and zero-shot learning, improving classification performance on the EPIC-KITCHENS dataset.
Contribution
It proposes a new evaluation split for open-set classification and demonstrates that metric-learning loss enhances zero-shot recognition without harming few-shot results.
Findings
Adding metric-learning loss improves zero-shot classification by up to 10%.
The new splits enable effective evaluation of open-set egocentric action recognition.
Unified approach bridges few- and zero-shot methods for more realistic scenarios.
Abstract
Although there has been significant research in egocentric action recognition, most methods and tasks, including EPIC-KITCHENS, suppose a fixed set of action classes. Fixed-set classification is useful for benchmarking methods, but is often unrealistic in practical settings due to the compositionality of actions, resulting in a functionally infinite-cardinality label set. In this work, we explore generalization with an open set of classes by unifying two popular approaches: few- and zero-shot generalization (the latter which we reframe as cross-modal few-shot generalization). We propose a new set of splits derived from the EPIC-KITCHENS dataset that allow evaluation of open-set classification, and use these splits to show that adding a metric-learning loss to the conventional direct-alignment baseline can improve zero-shot classification by as much as 10%, while not sacrificing few-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
