Can Knowledge of End-to-End Text-to-Speech Models Improve Neural   MIDI-to-Audio Synthesis Systems?

Xuan Shi; Erica Cooper; Xin Wang; Junichi Yamagishi; Shrikanth; Narayanan

arXiv:2211.13868·cs.SD·March 22, 2023

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi, Shrikanth, Narayanan

PDF

Open Access 1 Repo

TL;DR

This paper explores how techniques from end-to-end text-to-speech models can enhance neural MIDI-to-audio synthesis, leading to more natural-sounding music through improved methods and thorough evaluation.

Contribution

It introduces improvements to MIDI-to-audio systems by applying TTS techniques, including feature computation, model selection, and training strategies, with comprehensive evaluation.

Findings

01

Achieved highly natural-sounding music synthesis

02

Demonstrated improvements through listening tests and spectrogram analysis

03

Provided open-source code and models for community use

Abstract

With the similarity between music and speech synthesis from symbolic input and the rapid development of text-to-speech (TTS) techniques, it is worthwhile to explore ways to improve the MIDI-to-audio performance by borrowing from TTS techniques. In this study, we analyze the shortcomings of a TTS-based MIDI-to-audio system and improve it in terms of feature computation, model selection, and training strategy, aiming to synthesize highly natural-sounding audio. Moreover, we conducted an extensive model evaluation through listening tests, pitch measurement, and spectrogram analysis. This work demonstrates not only synthesis of highly natural music but offers a thorough analytical approach and useful outcomes for the community. Our code, pre-trained models, supplementary materials, and audio samples are open sourced at https://github.com/nii-yamagishilab/midi-to-audio.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nii-yamagishilab/midi-to-audio
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing