Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation
Jay Patravali, Gaurav Mittal, Ye Yu, Fuxin Li, Mei Chen

TL;DR
MetaUVFS is an unsupervised meta-learning approach for video few-shot action recognition that leverages large-scale unlabeled videos and a novel alignment module to outperform supervised methods on standard benchmarks.
Contribution
It introduces MetaUVFS, the first unsupervised meta-learning algorithm for video few-shot action recognition, with a novel Action-Appearance Aligned Meta-adaptation module.
Findings
Outperforms all unsupervised methods on few-shot benchmarks.
Requires no labeled base classes or supervised pretraining.
Can sometimes outperform supervised methods on popular datasets.
Abstract
We present MetaUVFS as the first Unsupervised Meta-learning algorithm for Video Few-Shot action recognition. MetaUVFS leverages over 550K unlabeled videos to train a two-stream 2D and 3D CNN architecture via contrastive learning to capture the appearance-specific spatial and action-specific spatio-temporal video features respectively. MetaUVFS comprises a novel Action-Appearance Aligned Meta-adaptation (A3M) module that learns to focus on the action-oriented video features in relation to the appearance features via explicit few-shot episodic meta-learning over unsupervised hard-mined episodes. Our action-appearance alignment and explicit few-shot learner conditions the unsupervised training to mimic the downstream few-shot task, enabling MetaUVFS to significantly outperform all unsupervised methods on few-shot benchmarks. Moreover, unlike previous few-shot action recognition methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Diabetic Foot Ulcer Assessment and Management · Domain Adaptation and Few-Shot Learning
Methods3 Dimensional Convolutional Neural Network · Contrastive Learning
