Accelerating High-Fidelity Waveform Generation via Adversarial Flow   Matching Optimization

Sang-Hoon Lee; Ha-Yeong Choi; Seong-Whan Lee

arXiv:2408.08019·cs.SD·August 16, 2024

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee

PDF

Open Access 1 Repo

TL;DR

This paper presents PeriodWave-Turbo, a waveform generation model that significantly accelerates high-fidelity speech synthesis using adversarial flow matching, reducing inference steps and improving quality with minimal fine-tuning.

Contribution

It introduces a novel adversarial flow matching optimization method that enhances CFM models, enabling high-quality waveform generation with fewer steps and improved generalization.

Findings

01

Achieves state-of-the-art PESQ score of 4.454 on LibriTTS

02

Reduces inference steps from 16 to 2-4

03

Requires only 1,000 fine-tuning steps for high performance

Abstract

This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficient waveform generation model via adversarial flow matching optimization. Recently, conditional flow matching (CFM) generative models have been successfully adopted for waveform generation tasks, leveraging a single vector field estimation objective for training. Although these models can generate high-fidelity waveform signals, they require significantly more ODE steps compared to GAN-based models, which only need a single generation step. Additionally, the generated samples often lack high-frequency information due to noisy vector field estimation, which fails to ensure high-frequency reproduction. To address this limitation, we enhance pre-trained CFM-based generative models by incorporating a fixed-step generator modification. We utilized reconstruction losses and adversarial feedback to accelerate high-fidelity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sh-lee-prml/periodwave
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Advanced Optical Sensing Technologies

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings