PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform   Generation

Sang-Hoon Lee; Ha-Yeong Choi; Seong-Whan Lee

arXiv:2408.07547·cs.SD·August 15, 2024

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee

PDF

Open Access 1 Repo

TL;DR

PeriodWave is a novel waveform generation model that explicitly captures periodic features using multi-period flow matching, achieving high-fidelity results efficiently in tasks like TTS.

Contribution

It introduces a period-aware flow matching estimator and a multi-period estimator, along with a single period-conditional universal estimator for efficient high-quality waveform synthesis.

Findings

01

Outperforms previous models in Mel-spectrogram reconstruction

02

Achieves superior results in text-to-speech tasks

03

Effectively disentangles frequency information for high-fidelity generation

Abstract

Recently, universal waveform generation tasks have been investigated conditioned on various out-of-distribution scenarios. Although GAN-based methods have shown their strength in fast waveform generation, they are vulnerable to train-inference mismatch scenarios such as two-stage text-to-speech. Meanwhile, diffusion-based models have shown their powerful generative performance in other domains; however, they stay out of the limelight due to slow inference speed in waveform generation tasks. Above all, there is no generator architecture that can explicitly disentangle the natural periodic features of high-resolution waveform signals. In this paper, we propose PeriodWave, a novel universal waveform generation model. First, we introduce a period-aware flow matching estimator that can capture the periodic features of the waveform signal when estimating the vector fields. Additionally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sh-lee-prml/periodwave
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Advanced Adaptive Filtering Techniques · Speech and Audio Processing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings