Learning to Transcribe by Ear
Rainer Kelz, Gerhard Widmer

TL;DR
This paper introduces a novel reinforcement learning approach to polyphonic transcription, modeling it as an agent interacting with an instrument and environment, inspired by human musicians, and demonstrates promising results in constrained settings.
Contribution
It formalizes polyphonic transcription as a reinforcement learning problem, providing a new conceptual framework and practical insights for training transcription agents.
Findings
Promising results in partially constrained environments.
Reinforcement learning can model the transcription process effectively.
Framework aligns with human musical transcription behavior.
Abstract
Rethinking how to model polyphonic transcription formally, we frame it as a reinforcement learning task. Such a task formulation encompasses the notion of a musical agent and an environment containing an instrument as well as the sound source to be transcribed. Within this conceptual framework, the transcription process can be described as the agent interacting with the instrument in the environment, and obtaining reward by playing along with what it hears. Choosing from a discrete set of actions - the notes to play on its instrument - the amount of reward the agent experiences depends on which notes it plays and when. This process resembles how a human musician might approach the task of transcription, and the satisfaction she achieves by closely mimicking the sound source to transcribe on her instrument. Following a discussion of the theoretical framework and the benefits of modelling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Reinforcement Learning in Robotics · Music and Audio Processing
