Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition
Huabin Liu, Weixian Lv, John See, Weiyao Lin

TL;DR
This paper introduces a task-adaptive spatial-temporal video sampler that improves few-shot action recognition by intelligently selecting and emphasizing critical frames and regions, leading to better utilization of limited video data.
Contribution
It proposes a novel, differentiable, task-specific video frame sampler with a temporal selector and spatial amplifier, enhancing few-shot action recognition performance.
Findings
Significant performance improvements on multiple benchmarks.
Effective selection of key frames and regions enhances recognition.
End-to-end trainable sampler integrates seamlessly with existing methods.
Abstract
A primary challenge faced in few-shot action recognition is inadequate video data for training. To address this issue, current methods in this field mainly focus on devising algorithms at the feature level while little attention is paid to processing input video data. Moreover, existing frame sampling strategies may omit critical action information in temporal and spatial dimensions, which further impacts video utilization efficiency. In this paper, we propose a novel video frame sampler for few-shot action recognition to address this issue, where task-specific spatial-temporal frame sampling is achieved via a temporal selector (TS) and a spatial amplifier (SA). Specifically, our sampler first scans the whole video at a small computational cost to obtain a global perception of video frames. The TS plays its role in selecting top-T frames that contribute most significantly and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Advanced Vision and Imaging
MethodsSpatio-temporal stability analysis
