Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Zhihuan Kuang, Shi Zong, Jianbing Zhang, Jiajun Chen, Hongfu Liu

TL;DR
This paper introduces a novel task called music-to-text synaesthesia, which generates descriptive texts from music recordings, supported by a new dataset and a model with a topology-preservation loss, demonstrating superior results.
Contribution
The paper presents the first dataset and model for music-to-text synaesthesia, advancing understanding of music content through descriptive text generation.
Findings
The proposed model outperforms five baseline methods.
The topology-preservation loss improves descriptive accuracy.
Qualitative analysis confirms the model's effectiveness.
Abstract
In this paper, we consider a novel research problem: music-to-text synaesthesia. Different from the classical music tagging problem that classifies a music recording into pre-defined categories, music-to-text synaesthesia aims to generate descriptive texts from music recordings with the same sentiment for further understanding. As existing music-related datasets do not contain the semantic descriptions on music recordings, we collect a new dataset that contains 1,955 aligned pairs of classical music recordings and text descriptions. Based on this, we build a computational model to generate sentences that can describe the content of the music recording. To tackle the highly non-discriminative classical music, we design a group topology-preservation loss, which considers more samples as a group reference and preserves the relative topology among different samples. Extensive experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗j-hartmann/emotion-english-distilroberta-basemodel· 609k dl· ♡ 485609k dl♡ 485
- 🤗Linna/emotion-english-distilroberta-melinnamodel· 15 dl15 dl
- 🤗mrhacker7599/emotion-english-distilroberta-basemodel· 10 dl10 dl
- 🤗aabhijeeet/my-model-1model
- 🤗HARSHU550/Emotionsmodel· 9 dl9 dl
- 🤗oxygeneDev/emotion-distilroberta-basemodel· 5 dl5 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Phonetics and Phonology Research · Diverse Musicological Studies
