# Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output   HMM for Multiple Voices

**Authors:** Eita Nakamura, Kazuyoshi Yoshii, Shigeki Sagayama

arXiv: 1701.08343 · 2017-01-31

## TL;DR

This paper introduces a merged-output hidden Markov model for rhythm transcription in polyphonic piano music, effectively capturing multiple voices and outperforming existing methods in accuracy, especially for polyrhythmic pieces.

## Contribution

The paper presents a novel merged-output HMM that explicitly models multiple voices, improving rhythm transcription accuracy for complex polyphonic music.

## Key findings

- Outperformed six other algorithms by over 12 points in polyrhythmic accuracy.
- Performed nearly as well as the best method for non-polyrhythmic music.
- Demonstrated state-of-the-art results in rhythm transcription for polyphonic piano music.

## Abstract

In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music. This model solves a major problem of conventional methods that could not properly describe the nature of multiple voices as in polyrhythmic scores or in the phenomenon of loose synchrony between voices. In this paper we present a complete description of the proposed model and develop an inference technique, which is valid for any merged-output HMMs for which output probabilities depend on past events. We also examine the influence of the architecture and parameters of the method in terms of accuracies of rhythm transcription and voice separation and perform comparative evaluations with six other algorithms. Using MIDI recordings of classical piano pieces, we found that the proposed model outperformed other methods by more than 12 points in the accuracy for polyrhythmic performances and performed almost as good as the best one for non-polyrhythmic performances. This reveals the state-of-the-art methods of rhythm transcription for the first time in the literature. Publicly available source codes are also provided for future comparisons.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.08343/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1701.08343/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1701.08343/full.md

---
Source: https://tomesphere.com/paper/1701.08343