Learning Normal Patterns in Musical Loops
Shayan Dadman, Bernt Arild Bremdal, B{\o}rre Bang, and Rune Dalmo

TL;DR
This paper presents an unsupervised deep learning framework for detecting audio patterns in musical loops, combining a hierarchical transformer with anomaly detection to improve pattern recognition without manual feature engineering.
Contribution
It introduces a novel architecture integrating HTS-AT, FFM, and Deep SVDD for unsupervised audio pattern detection, overcoming limitations of prior methods.
Findings
Deep SVDD with residual autoencoder improves anomaly detection.
The approach outperforms traditional methods like PCA and Isolation Forest.
Flexible, fully unsupervised pattern detection in diverse audio samples.
Abstract
This paper introduces an unsupervised framework for detecting audio patterns in musical samples (loops) through anomaly detection techniques, addressing challenges in music information retrieval (MIR). Existing methods are often constrained by reliance on handcrafted features, domain-specific limitations, or dependence on iterative user interaction. We address these limitations through an architecture combining deep feature extraction with unsupervised anomaly detection. Our approach leverages a pre-trained Hierarchical Token-semantic Audio Transformer (HTS-AT), paired with a Feature Fusion Mechanism (FFM), to generate representations from variable-length audio loops. These embeddings are processed using one-class Deep Support Vector Data Description (Deep SVDD), which learns normative audio patterns by mapping them to a compact latent hypersphere. Evaluations on curated bass and guitar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsAttention Is All You Need · Linear Layer · Adam · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Byte Pair Encoding
