Expressivity in TTS from Semantics and Pragmatics
Rodolfo Delmonte

TL;DR
This paper discusses the development of SPARSAR, an expressive TTS system capable of rendering poetry and stories with nuanced intonation by analyzing text at multiple linguistic levels and incorporating pragmatic cues.
Contribution
The paper introduces SPARSAR, a novel TTS system that integrates semantic, syntactic, and pragmatic analysis to produce expressive speech for diverse text types.
Findings
SPARSAR can read poetry with expressive intonation.
The system effectively analyzes text at phonetic, phonological, syntactic, and semantic levels.
Incorporating pragmatic cues enhances speech expressivity.
Abstract
In this paper we present ongoing work to produce an expressive TTS reader that can be used both in text and dialogue applications. The system called SPARSAR has been used to read (English) poetry so far but it can now be applied to any text. The text is fully analyzed both at phonetic and phonological level, and at syntactic and semantic level. In addition, the system has access to a restricted list of typical pragmatically marked phrases and expressions that are used to convey specific discourse function and speech acts and need specialized intonational contours. The text is transformed into a poem-like structures, where each line corresponds to a Breath Group, semantically and syntactically consistent. Stanzas correspond to paragraph boundaries. Analogical parameters are related to ToBI theoretical indices but their number is doubled. In this paper, we concentrate on short stories and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
