The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification
Tashi Namgyal, Alexander Hepburn, Raul Santos-Rodriguez, Valero, Laparra, Jesus Malo

TL;DR
This paper shows that using perceptual metrics as loss functions in autoencoder training enhances music genre classification by capturing meaningful features, leading to better generalization to new signals.
Contribution
It introduces a novel approach of employing perceptual metrics as loss functions in autoencoders for music representation learning, improving genre classification performance.
Findings
Autoencoder features trained with perceptual losses outperform direct metric-based methods.
Perceptual loss functions improve generalization to unseen music signals.
Using perceptual metrics as loss functions captures meaningful, human-aligned features.
Abstract
The subjective quality of natural signals can be approximated with objective perceptual metrics. Designed to approximate the perceptual behaviour of human observers, perceptual metrics often reflect structures found in natural signals and neurological pathways. Models trained with perceptual metrics as loss functions can capture perceptually meaningful features from the structures held within these metrics. We demonstrate that using features extracted from autoencoders trained with perceptual losses can improve performance on music understanding tasks, i.e. genre classification, over using these metrics directly as distances when learning a classifier. This result suggests improved generalisation to novel signals when using perceptual metrics as loss functions for representation learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
