Audio-to-Score Conversion Model Based on Whisper methodology

Hongyao Zhang; Bohang Sun

arXiv:2410.17209·cs.SD·October 23, 2024

Audio-to-Score Conversion Model Based on Whisper methodology

Hongyao Zhang, Bohang Sun

PDF

Open Access 1 Datasets

TL;DR

This paper presents a Transformer-based model leveraging Whisper for converting music audio into ABC notation, introducing a new notation system and tokenizer, with improved accuracy over traditional methods.

Contribution

It introduces the Orpheus' Score notation system, a custom tokenizer, and a comprehensive data processing workflow for audio-to-score conversion.

Findings

01

Significantly improved accuracy compared to traditional algorithms

02

Effective data augmentation through mutation mechanisms

03

Provides a practical tool for music enthusiasts and researchers

Abstract

This thesis develops a Transformer model based on Whisper, which extracts melodies and chords from music audio and records them into ABC notation. A comprehensive data processing workflow is customized for ABC notation, including data cleansing, formatting, and conversion, and a mutation mechanism is implemented to increase the diversity and quality of training data. This thesis innovatively introduces the "Orpheus' Score", a custom notation system that converts music information into tokens, designs a custom vocabulary library, and trains a corresponding custom tokenizer. Experiments show that compared to traditional algorithms, the model has significantly improved accuracy and performance. While providing a convenient audio-to-score tool for music enthusiasts, this work also provides new ideas and tools for research in music information processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

BOB12311/Orpheus_Hearing
dataset· 795 dl
795 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Multi-Head Attention · Adam · Dropout