Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Detai Xin, Shinnosuke Takamichi, Takuma Okamoto, Hisashi Kawai,, Hiroshi Saruwatari

TL;DR
This paper introduces a novel speaking-rate-controllable HiFi-GAN neural vocoder that effectively adjusts speech speed without sacrificing quality, using feature interpolation and image scaling techniques, validated on a new Japanese speech corpus.
Contribution
It integrates a differentiable interpolation layer into HiFi-GAN for controllable speaking rate adjustment, maintaining high fidelity and efficiency.
Findings
Outperforms baseline time-scale modification in naturalness
Image scaling of mel-spectrograms yields best performance
Maintains computational efficiency with rate control
Abstract
This paper presents a speaking-rate-controllable HiFi-GAN neural vocoder. Original HiFi-GAN is a high-fidelity, computationally efficient, and tiny-footprint neural vocoder. We attempt to incorporate a speaking rate control function into HiFi-GAN for improving the accessibility of synthetic speech. The proposed method inserts a differentiable interpolation layer into the HiFi-GAN architecture. A signal resampling method and an image scaling method are implemented in the proposed method to warp the mel-spectrograms or hidden features of the neural vocoder. We also design and open-source a Japanese speech corpus containing three kinds of speaking rates to evaluate the proposed speaking rate control method. Experimental results of comprehensive objective and subjective evaluations demonstrate that 1) the proposed method outperforms a baseline time-scale modification algorithm in speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
