Differentiable Time-Varying Linear Prediction in the Context of   End-to-End Analysis-by-Synthesis

Chin-Yun Yu; Gy\"orgy Fazekas

arXiv:2406.05128·eess.AS·October 21, 2024

Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis

Chin-Yun Yu, Gy\"orgy Fazekas

PDF

Open Access 4 Repos

TL;DR

This paper introduces an efficient, differentiable, sample-wise time-varying linear prediction method for audio synthesis, improving end-to-end training and voice reconstruction quality over existing frame-wise approaches.

Contribution

It generalizes the GOLF vocoder's LP implementation to time-varying cases, enabling better end-to-end training and higher quality voice synthesis.

Findings

01

GOLF with time-varying LP outperforms frame-wise versions in voice reconstruction.

02

Synthesized voices from GOLF scored higher than state-of-the-art WORLD vocoder.

03

The method enables faster, more accurate end-to-end audio synthesis.

Abstract

Training the linear prediction (LP) operator end-to-end for audio synthesis in modern deep learning frameworks is slow due to its recursive formulation. In addition, frame-wise approximation as an acceleration method cannot generalise well to test time conditions where the LP is computed sample-wise. Efficient differentiable sample-wise LP for end-to-end training is the key to removing this barrier. We generalise the efficient time-invariant LP implementation from the GOLF vocoder to time-varying cases. Combining this with the classic source-filter model, we show that the improved GOLF learns LP coefficients and reconstructs the voice better than its frame-wise counterparts. Moreover, in our listening test, synthesised outputs from GOLF scored higher in quality ratings than the state-of-the-art differentiable WORLD vocoder.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications