Knowledge Guided Learning: Towards Open Domain Egocentric Action Recognition with Zero Supervision
Sathyanarayanan N. Aakur, Sanjoy Kundu, Nikhil Gunti

TL;DR
This paper introduces a novel approach for open-domain egocentric action recognition that leverages attention and commonsense knowledge for zero-shot learning, enabling the discovery of new actions without labeled data.
Contribution
It proposes a knowledge-guided learning framework using Pattern Theory to facilitate zero-shot action recognition and object detection in egocentric videos.
Findings
Effective zero-shot recognition of novel actions and objects.
Competitive performance on GTEA Gaze datasets.
Demonstrates the potential of knowledge-guided self-supervised learning.
Abstract
Advances in deep learning have enabled the development of models that have exhibited a remarkable tendency to recognize and even localize actions in videos. However, they tend to experience errors when faced with scenes or examples beyond their initial training environment. Hence, they fail to adapt to new domains without significant retraining with large amounts of annotated data. In this paper, we propose to overcome these limitations by moving to an open-world setting by decoupling the ideas of recognition and reasoning. Building upon the compositional representation offered by Grenander's Pattern Theory formalism, we show that attention and commonsense knowledge can be used to enable the self-supervised discovery of novel actions in egocentric videos in an open-world setting, where data from the observed environment (the target domain) is open i.e., the vocabulary is partially known…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
