Audio to score matching by combining phonetic and duration information

Rong Gong; Jordi Pons; Xavier Serra

arXiv:1707.03547·cs.SD·July 13, 2017·1 cites

Audio to score matching by combining phonetic and duration information

Rong Gong, Jordi Pons, Xavier Serra

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel approach for matching jingju singing audio to scores by integrating phonetic and duration information, improving accuracy over methods relying solely on melodic contours.

Contribution

It introduces a combined phonetic and duration-based matching method using CNNs, DNNs, GMMs, and compares duration models, specifically tailored for jingju a cappella singing.

Findings

01

CNNs outperform DNNs and GMMs on small datasets

02

HSMM outperforms post-processor duration models

03

Combining phonetic and duration info improves matching accuracy

Abstract

We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case. We argue that, due to the existence of a basic melodic contour for each mode in jingju music, only using melodic information (such as pitch contour) will result in an ambiguous matching. This leads us to propose a matching approach based on the use of phonetic and duration information. Phonetic information is extracted with an acoustic model shaped with our data, and duration information is considered with the Hidden Markov Models (HMMs) variants we investigate. We build a model for each lyric path in our scores and we achieve the matching by ranking the posterior probabilities of the decoded most likely state sequences. Three acoustic models are investigated: (i) convolutional neural networks (CNNs), (ii) deep neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ronggong/jingjuSingingPhraseMatching
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies