It\^oWave: It\^o Stochastic Differential Equation Is All You Need For   Wave Generation

Shoule Wu; Ziqiang Shi

arXiv:2201.12519·cs.SD·April 15, 2022

It\^oWave: It\^o Stochastic Differential Equation Is All You Need For Wave Generation

Shoule Wu, Ziqiang Shi

PDF

Open Access

TL;DR

ItôWave introduces a novel vocoder based on stochastic differential equations that effectively transforms noise into realistic audio conditioned on mel spectrograms, outperforming current state-of-the-art methods in quality.

Contribution

The paper presents a new wave generation model using forward and reverse SDEs, offering a probabilistic approach that improves audio quality over existing methods.

Findings

01

ItôWave achieves higher MOS scores than SOTA methods.

02

Generated audio samples demonstrate high realism.

03

The SDE-based approach effectively models wave distribution transformations.

Abstract

In this paper, we propose a vocoder based on a pair of forward and reverse-time linear stochastic differential equations (SDE). The solutions of this SDE pair are two stochastic processes, one of which turns the distribution of wave, that we want to generate, into a simple and tractable distribution. The other is the generation procedure that turns this tractable simple signal into the target wave. The model is called It\^oWave. It\^oWave use the Wiener process as a driver to gradually subtract the excess signal from the noise signal to generate realistic corresponding meaningful audio respectively, under the conditional inputs of original mel spectrogram. The results of the experiment show that the mean opinion scores (MOS) of It\^oWave can exceed the current state-of-the-art (SOTA) methods, and reached 4.35 $\pm$ 0.115. The generated audio samples are available online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Adaptive Filtering Techniques · Music and Audio Processing