LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Masaya Kawamura, Ryuichi Yamamoto, Yuma Shirahata, Takuya Hasumi,, Kentaro Tachibana

TL;DR
LibriTTS-P is a new speech corpus with detailed speaking style and speaker prompts, enabling improved controllable TTS and style captioning with higher naturalness and accuracy.
Contribution
It introduces LibriTTS-P, a diverse, annotated corpus with human and synthetic prompts for speaker and style, enhancing TTS controllability and captioning performance.
Findings
TTS models trained on LibriTTS-P achieve higher naturalness.
Style captioning models generate 2.5 times more accurate words.
LibriTTS-P outperforms conventional datasets in diversity and annotation quality.
Abstract
We introduce LibriTTS-P, a new corpus based on LibriTTS-R that includes utterance-level descriptions (i.e., prompts) of speaking style and speaker-level prompts of speaker characteristics. We employ a hybrid approach to construct prompt annotations: (1) manual annotations that capture human perceptions of speaker characteristics and (2) synthetic annotations on speaking style. Compared to existing English prompt datasets, our corpus provides more diverse prompt annotations for all speakers of LibriTTS-R. Experimental results for prompt-based controllable TTS demonstrate that the TTS model trained with LibriTTS-P achieves higher naturalness than the model using the conventional dataset. Furthermore, the results for style captioning tasks show that the model utilizing LibriTTS-P generates 2.5 times more accurate words than the model using a conventional dataset. Our corpus, LibriTTS-P, is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Subtitles and Audiovisual Media · Translation Studies and Practices
