TL;DR
FastPitch is a parallel text-to-speech model that predicts pitch contours to generate more expressive and semantically aligned speech efficiently, maintaining high quality and real-time synthesis speed.
Contribution
It introduces a fully-parallel TTS model conditioned on pitch contours, enabling expressive speech synthesis without additional computational overhead.
Findings
Achieves high-quality speech comparable to state-of-the-art methods.
Enables controllable pitch modulation for expressive speech.
Maintains over 900x real-time synthesis speed.
Abstract
We present FastPitch, a fully-parallel text-to-speech model based on FastSpeech, conditioned on fundamental frequency contours. The model predicts pitch contours during inference. By altering these predictions, the generated speech can be more expressive, better match the semantic of the utterance, and in the end more engaging to the listener. Uniformly increasing or decreasing pitch with FastPitch generates speech that resembles the voluntary modulation of voice. Conditioning on frequency contours improves the overall quality of synthesized speech, making it comparable to state-of-the-art. It does not introduce an overhead, and FastPitch retains the favorable, fully-parallel Transformer architecture, with over 900x real-time factor for mel-spectrogram synthesis of a typical utterance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/tts_en_fastpitchmodel· 272 dl· ♡ 40272 dl♡ 40
- 🤗inOXcrm/German_multispeaker_FastPitch_nemomodel· 8 dl· ♡ 28 dl♡ 2
- 🤗infinisoft/ttsmodel· ♡ 4♡ 4
- 🤗theodotus/tts_uk_fastpitchmodel· 10 dl· ♡ 210 dl♡ 2
- 🤗Bilgilice/bilgilice35model
- 🤗Mastering-Python-HF/nvidia_tts_en_fastpitch_multispeakermodel· 1 dl· ♡ 11 dl♡ 1
- 🤗Mastering-Python-HF/nvidia_tts_en_hifitts_hifigan_ft_fastpitchmodel· 17 dl· ♡ 117 dl♡ 1
- 🤗Pendrokar/xvasynth_lojbanmodel· ♡ 1♡ 1
- 🤗praveenchordia/ttsmodel· ♡ 1♡ 1
- 🤗Pendrokar/xvasynth_lisa_enmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Convolution · FastPitch · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia?
