Significance of Maximum Spectral Amplitude in Sub-bands for Spectral Envelope Estimation and Its Application to Statistical Parametric Speech Synthesis
Sivanand Achanta, Anandaswarup Vadapalli, Sai Krishna R., Suryakanth, V. Gangashetty

TL;DR
This paper introduces MSASB, a spectral envelope estimation technique based on maximum spectral amplitudes in sub-bands, which is interpretable and comparable to existing methods like STRAIGHT, with applications in speech synthesis.
Contribution
The paper presents a novel spectral envelope parametrization method using maximum sub-band spectral amplitudes, directly in the spectral domain, enhancing interpretability and effectiveness in speech synthesis.
Findings
MSASB method performs comparably to STRAIGHT in analysis-by-synthesis.
The spectral envelope parametrization improves interpretability over cepstral methods.
Effective in statistical parametric speech synthesis with deep neural networks.
Abstract
In this paper we propose a technique for spectral envelope estimation using maximum values in the sub-bands of Fourier magnitude spectrum (MSASB). Most other methods in the literature parametrize spectral envelope in cepstral domain such as Mel-generalized cepstrum etc. Such cepstral domain representations, although compact, are not readily interpretable. This difficulty is overcome by our method which parametrizes in the spectral domain itself. In our experiments, spectral envelope estimated using MSASB method was incorporated in the STRAIGHT vocoder. Both objective and subjective results of analysis-by-synthesis indicate that the proposed method is comparable to STRAIGHT. We also evaluate the effectiveness of the proposed parametrization in a statistical parametric speech synthesis framework using deep neural networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
