CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition
Shreyank N Gowda, Laura Sevilla-Lara, Frank Keller, Marcus Rohrbach

TL;DR
CLASTER introduces a reinforcement learning-based clustering method for zero-shot action recognition, effectively generalizing to unseen classes and outperforming existing methods across multiple datasets.
Contribution
The paper presents a novel centroid-based clustering approach optimized with reinforcement learning for improved zero-shot action recognition.
Findings
Outperforms state-of-the-art on UCF101, HMDB51, and Olympic Sports datasets.
Effective in both zero-shot and generalized zero-shot learning scenarios.
Performs competitively in image domain tasks.
Abstract
Zero-shot action recognition is the task of recognizingaction classes without visual examples, only with a seman-tic embedding which relates unseen to seen classes. Theproblem can be seen as learning a function which general-izes well to instances of unseen classes without losing dis-crimination between classes. Neural networks can modelthe complex boundaries between visual classes, which ex-plains their success as supervised models. However, inzero-shot learning, these highly specialized class bound-aries may not transfer well from seen to unseen classes.In this paper we propose a centroid-based representation,which clusters visual and semantic representation, consid-ers all training samples at once, and in this way generaliz-ing well to instances from unseen classes. We optimize theclustering using Reinforcement Learning which we show iscritical for our approach to work. We call the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
