Content-based Music Similarity with Triplet Networks
Joseph Cleveland, Derek Cheng, Michael Zhou, Thorsten Joachims,, Douglas Turnbull

TL;DR
This paper investigates using triplet neural networks to embed songs based on content similarity, comparing different triplet selection methods, and demonstrating initial success in artist retrieval tasks.
Contribution
It introduces a triplet network approach for music embedding and compares random versus genre-based triplet selection methods.
Findings
Shallow Siamese networks can embed music for artist retrieval.
Genre-based triplet selection improves embedding quality.
Initial results show feasibility for content-based music similarity.
Abstract
We explore the feasibility of using triplet neural networks to embed songs based on content-based music similarity. Our network is trained using triplets of songs such that two songs by the same artist are embedded closer to one another than to a third song by a different artist. We compare two models that are trained using different ways of picking this third song: at random vs. based on shared genre labels. Our experiments are conducted using songs from the Free Music Archive and use standard audio features. The initial results show that shallow Siamese networks can be used to embed music for a simple artist retrieval task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
