PKSpell: Data-Driven Pitch Spelling and Key Signature Estimation
Francesco Foscarin (CNAM), Nicolas Audebert, Rapha\"el, Fournier-S'Niehotta

TL;DR
PKSpell is a data-driven deep learning system that jointly estimates pitch spelling and key signatures from MIDI files, improving accuracy and facilitating various music information retrieval tasks.
Contribution
It introduces a neural network model for joint pitch and key estimation from MIDI data, with a novel data augmentation method and state-of-the-art results on multiple datasets.
Findings
Achieves high accuracy in key signature estimation.
Sets new state-of-the-art in pitch spelling on MuseData.
Effective with limited datasets due to data augmentation.
Abstract
We present PKSpell: a data-driven approach for the joint estimation of pitch spelling and key signatures from MIDI files. Both elements are fundamental for the production of a full-fledged musical score and facilitate many MIR tasks such as harmonic analysis, section identification, melodic similarity, and search in a digital music library. We design a deep recurrent neural network model that only requires information readily available in all kinds of MIDI files, including performances, or other symbolic encodings. We release a model trained on the ASAP dataset. Our system can be used with these pre-trained parameters and is easy to integrate into a MIR pipeline. We also propose a data augmentation procedure that helps retraining on small datasets. PKSpell achieves strong key signature estimation performance on a challenging dataset. Most importantly, this model establishes a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
