Audio-based Distributional Semantic Models for Music Auto-tagging and Similarity Measurement
Giannis Karamanolakis, Elias Iosif, Athanasia Zlatintsi, Aggelos, Pikrakis, Alexandros Potamianos

TL;DR
This paper introduces Audio-based Distributional Semantic Models that jointly embed audio and semantic information for improved music auto-tagging and similarity measurement, outperforming existing methods.
Contribution
It presents novel joint acoustic-semantic representations for music, enhancing tag prediction and similarity tasks with superior performance.
Findings
Outperforms state-of-the-art in music similarity measurement
Produces high-quality tags for audio clips
Demonstrates effective joint acoustic-semantic embeddings
Abstract
The recent development of Audio-based Distributional Semantic Models (ADSMs) enables the computation of audio and lexical vector representations in a joint acoustic-semantic space. In this work, these joint representations are applied to the problem of automatic tag generation. The predicted tags together with their corresponding acoustic representation are exploited for the construction of acoustic-semantic clip embeddings. The proposed algorithms are evaluated on the task of similarity measurement between music clips. Acoustic-semantic models are shown to outperform the state-of-the-art for this task and produce high quality tags for audio/music clips.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
