Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Suparna De, Ionut Bostan, Nishanth Sastry

TL;DR
This paper presents an innovative end-to-end TTS system that generates emotionally expressive speech for social platform accessibility, integrating text analysis and natural language processing to improve naturalness and real-time performance.
Contribution
The work introduces a context-aware TTS system that derives emotion from text and synthesizes expressive speech, addressing data simplification and duration inaccuracy issues.
Findings
Competitive inference time performance against state-of-the-art models
Effective emotion and speaker feature integration for natural speech
Enhanced accessibility for visually impaired users
Abstract
Recent studies have outlined the accessibility challenges faced by blind or visually impaired, and less-literate people, in interacting with social networks, in-spite of facilitating technologies such as monotone text-to-speech (TTS) screen readers and audio narration of visual elements such as emojis. Emotional speech generation traditionally relies on human input of the expected emotion together with the text to synthesise, with additional challenges around data simplification (causing information loss) and duration inaccuracy, leading to lack of expressive emotional rendering. In real-life communications, the duration of phonemes can vary since the same sentence might be spoken in a variety of ways depending on the speakers' emotional states or accents (referred to as the one-to-many problem of text to speech generation). As a result, an advanced voice synthesis system is required to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Speech and dialogue systems · AI in Service Interactions
