TL;DR
DASZL introduces a compositional, zero-shot activity recognition framework using dynamic action signatures, enabling recognition of unseen activities across various datasets and applications without extensive training.
Contribution
The paper presents a novel compositional approach to zero-shot activity recognition using dynamic action signatures supported by deep learning and state machines.
Findings
Achieves new state-of-the-art results on Olympic Sports and UCF101 datasets.
Extends to zero-shot joint segmentation and classification in videos.
Demonstrates recognition in de-novo settings with object detectors.
Abstract
There are many realistic applications of activity recognition where the set of potential activity descriptions is combinatorially large. This makes end-to-end supervised training of a recognition system impractical as no training set is practically able to encompass the entire label set. In this paper, we present an approach to fine-grained recognition that models activities as compositions of dynamic action signatures. This compositional approach allows us to reframe fine-grained recognition as zero-shot activity recognition, where a detector is composed "on the fly" from simple first-principles state machines supported by deep-learned components. We evaluate our method on the Olympic Sports and UCF101 datasets, where our model establishes a new state of the art under multiple experimental paradigms. We also extend this method to form a unique framework for zero-shot joint segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
