Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation
Yongqin Xian, Bruno Korbar, Matthijs Douze, Lorenzo Torresani, Bernt, Schiele, Zeynep Akata

TL;DR
This paper introduces a new approach for few-shot video classification that leverages spatiotemporal features, video retrieval, and feature generation, significantly improving performance on more realistic benchmarks involving multiple novel classes.
Contribution
It proposes a simple two-stage baseline with 3D CNN features and introduces novel retrieval and generative methods to enhance few-shot video classification performance.
Findings
Outperforms prior methods by over 20 points on benchmarks.
Retrieval-based approach improves accuracy with tag-labeled videos.
Feature generation with GANs enhances recognition of novel classes.
Abstract
Few-shot learning aims to recognize novel classes from a few examples. Although significant progress has been made in the image domain, few-shot video classification is relatively unexplored. We argue that previous methods underestimate the importance of video feature learning and propose to learn spatiotemporal features using a 3D CNN. Proposing a two-stage approach that learns video features on base classes followed by fine-tuning the classifiers on novel classes, we show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks. To circumvent the need of labeled examples, we present two novel approaches that yield further improvement. First, we leverage tag-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities. Second, we learn generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
