SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition

Wenbo Huang; Jinghui Zhang; Xuwei Qian; Zhen Wu; Meng Wang; Lei Zhang

arXiv:2407.16344·cs.CV·March 25, 2026

SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition

Wenbo Huang, Jinghui Zhang, Xuwei Qian, Zhen Wu, Meng Wang, Lei Zhang

PDF

1 Repo

TL;DR

This paper introduces SOAP, a novel architecture for few-shot action recognition that captures comprehensive spatio-temporal and motion information using frame tuples, achieving state-of-the-art results across multiple benchmarks.

Contribution

Proposes SOAP-Net, a plug-and-play architecture that enhances spatio-temporal relation modeling and motion information capturing in few-shot action recognition.

Findings

01

Achieves new state-of-the-art performance on SthSthV2, Kinetics, UCF101, and HMDB51.

02

Demonstrates robustness and generalization across benchmarks.

03

Shows the effectiveness of frame tuples with diverse frame counts.

Abstract

High frame-rate (HFR) videos of action recognition improve fine-grained expression while reducing the spatio-temporal relation and motion information density. Thus, large amounts of video samples are continuously required for traditional data-driven training. However, samples are not always sufficient in real-world scenarios, promoting few-shot action recognition (FSAR) research. We observe that most recent FSAR works build spatio-temporal relation of video samples via temporal alignment after spatial feature extraction, cutting apart spatial and temporal features within samples. They also capture motion information via narrow perspectives between adjacent frames without considering density, leading to insufficient motion information capturing. Therefore, we propose a novel plug-and-play architecture for FSAR called Spatio-tempOral frAme tuPle enhancer (SOAP) in this paper. The model we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenbohuang1002/soap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.