Can pre-trained Deep Learning models predict groove ratings?
Axel Marmoret, Nicolas Farrugia, Jan Alexander Stupacher

TL;DR
This paper evaluates how well deep learning models can predict groove ratings from audio signals, comparing their effectiveness to traditional features and analyzing style-dependent influences.
Contribution
It demonstrates that deep audio representations can effectively encode complex groove characteristics, surpassing traditional handcrafted features.
Findings
Deep learning models accurately predict groove ratings across styles.
Deep audio features outperform traditional features in groove prediction.
Source-separated analysis reveals style-dependent groove components.
Abstract
This study explores the extent to which deep learning models can predict groove and its related perceptual dimensions directly from audio signals. We critically examine the effectiveness of seven state-of-the-art deep learning models in predicting groove ratings and responses to groove-related queries through the extraction of audio embeddings. Additionally, we compare these predictions with traditional handcrafted audio features. To better understand the underlying mechanics, we extend this methodology to analyze predictions based on source-separated instruments, thereby isolating the contributions of individual musical elements. Our analysis reveals a clear separation of groove characteristics driven by the underlying musical style of the tracks (funk, pop, and rock). These findings indicate that deep audio representations can successfully encode complex, style-dependent groove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
