A Computational Memory and Processing Model for Processing for Prosody
Janet E. Cahn

TL;DR
This paper presents LOQ, a text-to-speech model that simulates prosody by incorporating limited attention and working memory, capturing stylistic and individual variations in speech intonation.
Contribution
It introduces a novel computational model linking prosody to attentional and memory constraints, enabling varied and naturalistic speech styles.
Findings
Varying attentional parameters affects prosodic contours.
The model produces multiple prosody styles including child-like and expressive.
Simulations show natural stylistic and individual prosody variations.
Abstract
This paper links prosody to the information in a text and how it is processed by the speaker. It describes the operation and output of LOQ, a text-to-speech implementation that includes a model of limited attention and working memory. Attentional limitations are key. Varying the attentional parameter in the simulations varies in turn what counts as given and new in a text, and therefore, the intonational contours with which it is uttered. Currently, the system produces prosody in three different styles: child-like, adult expressive, and knowledgeable. This prosody also exhibits differences within each style -- no two simulations are alike. The limited resource approach captures some of the stylistic and individual variety found in natural prosody.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling
