Significance of Maximum Spectral Amplitude in Sub-bands for Spectral   Envelope Estimation and Its Application to Statistical Parametric Speech   Synthesis

Sivanand Achanta; Anandaswarup Vadapalli; Sai Krishna R.; Suryakanth; V. Gangashetty

arXiv:1508.00354·cs.SD·August 4, 2015

Significance of Maximum Spectral Amplitude in Sub-bands for Spectral Envelope Estimation and Its Application to Statistical Parametric Speech Synthesis

Sivanand Achanta, Anandaswarup Vadapalli, Sai Krishna R., Suryakanth, V. Gangashetty

PDF

Open Access

TL;DR

This paper introduces MSASB, a spectral envelope estimation technique based on maximum spectral amplitudes in sub-bands, which is interpretable and comparable to existing methods like STRAIGHT, with applications in speech synthesis.

Contribution

The paper presents a novel spectral envelope parametrization method using maximum sub-band spectral amplitudes, directly in the spectral domain, enhancing interpretability and effectiveness in speech synthesis.

Findings

01

MSASB method performs comparably to STRAIGHT in analysis-by-synthesis.

02

The spectral envelope parametrization improves interpretability over cepstral methods.

03

Effective in statistical parametric speech synthesis with deep neural networks.

Abstract

In this paper we propose a technique for spectral envelope estimation using maximum values in the sub-bands of Fourier magnitude spectrum (MSASB). Most other methods in the literature parametrize spectral envelope in cepstral domain such as Mel-generalized cepstrum etc. Such cepstral domain representations, although compact, are not readily interpretable. This difficulty is overcome by our method which parametrizes in the spectral domain itself. In our experiments, spectral envelope estimated using MSASB method was incorporated in the STRAIGHT vocoder. Both objective and subjective results of analysis-by-synthesis indicate that the proposed method is comparable to STRAIGHT. We also evaluate the effectiveness of the proposed parametrization in a statistical parametric speech synthesis framework using deep neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing