Prosody Analysis of Audiobooks
Charuta Pethe, Bach Pham, Felix D Childress, Yunting Yin, Steven Skiena

TL;DR
This paper introduces improved prosody prediction models for audiobook narration, leveraging a new dataset to better emulate human vocal expressiveness compared to existing TTS systems.
Contribution
The study presents a novel dataset of aligned book-audiobook pairs and demonstrates models that significantly enhance prosody prediction accuracy for audiobook narration.
Findings
Predicted pitch correlates better with human readings in 22 out of 24 books.
Predicted volume aligns more closely with human narration in 23 out of 24 books.
Human evaluations favor prosody-enhanced readings over commercial TTS systems.
Abstract
Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text. However, audiobook narrations involve dramatic vocalizations and intonations by the reader, with greater reliance on emotions, dialogues, and descriptions in the narrative. Using our dataset of 93 aligned book-audiobook pairs, we present improved models for prosody prediction properties (pitch, volume, and rate of speech) from narrative text using language modeling. Our predicted prosody attributes correlate much better with human audiobook readings than results from a state-of-the-art commercial TTS system: our predicted pitch shows a higher correlation with human reading for 22 out of the 24 books, while our predicted volume attribute proves more similar to human reading for 23 out of the 24 books. Finally, we present a human evaluation study to quantify the extent that people prefer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research
