Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition
Jiazheng Xing, Mengmeng Wang, Yong Liu, Boyu Mu

TL;DR
This paper introduces SloshNet, a novel framework for few-shot action recognition that effectively models both low-level spatial features and short-term temporal relations, leading to improved performance across multiple datasets.
Contribution
The paper proposes a feature fusion search module and separate long-term and short-term temporal modules to enhance spatial-temporal modeling in few-shot action recognition.
Findings
Achieves state-of-the-art results on four datasets
Effectively models local semantic and motion features
Outperforms existing methods in accuracy
Abstract
Spatial and temporal modeling is one of the most core aspects of few-shot action recognition. Most previous works mainly focus on long-term temporal relation modeling based on high-level spatial representations, without considering the crucial low-level spatial features and short-term temporal relations. Actually, the former feature could bring rich local semantic information, and the latter feature could represent motion characteristics of adjacent frames, respectively. In this paper, we propose SloshNet, a new framework that revisits the spatial and temporal modeling for few-shot action recognition in a finer manner. First, to exploit the low-level spatial features, we design a feature fusion architecture search module to automatically search for the best combination of the low-level and high-level spatial features. Next, inspired by the recent transformer, we introduce a long-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Diabetic Foot Ulcer Assessment and Management
