MusicTM-Dataset for Joint Representation Learning among Sheet Music, Lyrics, and Musical Audio
Donghuo Zeng, Yi Yu, Keizo Oyama

TL;DR
This paper introduces the MusicTM-Dataset, a large, multi-modal music dataset with sheet music, lyrics, and synthesized audio, designed to enhance cross-modal retrieval and shared representation learning.
Contribution
The paper presents a new multi-modal music dataset with aligned sheet music, lyrics, and audio, enabling improved cross-modal retrieval and representation learning in music information retrieval.
Findings
Achieved basic cross-modal retrieval methods using the dataset
Demonstrated the dataset's utility for shared representation learning
Provided accessible dataset for future research
Abstract
This work present a music dataset named MusicTM-Dataset, which is utilized in improving the representation learning ability of different types of cross-modal retrieval (CMR). Little large music dataset including three modalities is available for learning representations for CMR. To collect a music dataset, we expand the original musical notation to synthesize audio and generated sheet-music image, and build musical notation based sheet-music image, audio clip and syllable-denotation text as fine-grained alignment, such that the MusicTM-Dataset can be exploited to receive shared representation for multimodal data points. The MusicTM-Dataset presents 3 kinds of modalities, which consists of the image of sheet-music, the text of lyrics and synthesized audio, their representations are extracted by some advanced models. In this paper, we introduce the background of music dataset and express…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Video Analysis and Summarization
