Learning Long-Range Action Representation by Two-Stream Mamba Pyramid Network for Figure Skating Assessment
Fengshun Wang, Qiurui Wang, Peilin Zhao

TL;DR
This paper introduces a two-stream Mamba pyramid network designed to accurately evaluate figure skating performances by separately assessing technical and artistic elements, effectively handling long videos and diverse action scales.
Contribution
The proposed method uniquely separates TES and PCS evaluations with multi-level fusion and multi-scale pyramids, improving accuracy and efficiency over prior unified approaches.
Findings
Achieves state-of-the-art performance on FineFS benchmark.
Effectively localizes and evaluates actions across various temporal scales.
Handles long videos with linear computational complexity.
Abstract
Technical Element Score (TES) and Program Component Score (PCS) evaluations in figure skating demand precise assessment of athletic actions and artistic interpretation, respectively. Existing methods face three major challenges. Firstly, video and audio cues are regarded as common features for both TES and PCS predictions in previous works without considering the prior evaluation criterion of figure skating. Secondly, action elements in competitions are separated in time, TES should be derived from each element's score, but existing methods try to give an overall TES prediction without evaluating each action element. Thirdly, lengthy competition videos make it difficult and inefficient to handle long-range contexts. To address these challenges, we propose a two-stream Mamba pyramid network that aligns with actual judging criteria to predict TES and PCS by separating visual-feature based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
