Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

Jingfei Xia; Mingchen Zhuge; Tiantian Geng; Shun Fan; Yuantai Wei,; Zhenyu He; Feng Zheng

arXiv:2203.03990·cs.CV·December 20, 2022·1 cites

Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

Jingfei Xia, Mingchen Zhuge, Tiantian Geng, Shun Fan, Yuantai Wei,, Zhenyu He, Feng Zheng

PDF

Open Access 1 Repo 1 Video

TL;DR

Skating-Mixer is a novel multimodal MLP-based architecture designed to analyze long-term figure skating videos by effectively modeling audio-visual relationships and capturing rapid movements, achieving state-of-the-art results.

Contribution

The paper introduces Skating-Mixer, a new MLP-based model with a memory recurrent unit for long-term audio-visual modeling in figure skating, along with a large, diverse FS1000 dataset.

Findings

01

Achieves state-of-the-art performance on Fis-V and FS1000 datasets.

02

Effectively models rapid movements and long-term dependencies.

03

Proves applicability in Olympic competition analysis.

Abstract

Figure skating scoring is challenging because it requires judging the technical moves of the players as well as their coordination with the background music. Most learning-based methods cannot solve it well for two reasons: 1) each move in figure skating changes quickly, hence simply applying traditional frame sampling will lose a lot of valuable information, especially in 3 to 5 minutes long videos; 2) prior methods rarely considered the critical audio-visual relationship in their models. Due to these reasons, we introduce a novel architecture, named Skating-Mixer. It extends the MLP framework into a multimodal fashion and effectively learns long-term representations through our designed memory recurrent unit (MRU). Aside from the model, we collected a high-quality audio-visual FS1000 dataset, which contains over 1000 videos on 8 types of programs with 7 different rating metrics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andyfrancesco29/audio-visual-figure-skating
pytorchOfficial

Videos

Skating-Mixer: Long-term Sport Audio-Visual Modeling with MLPs· underline

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Video Analysis and Summarization