Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia
Salita Ulitia Prini, Ary Setijadi Prihatmanto

TL;DR
This paper presents a prosody manipulation method to add emotions like happy, angry, and sad to Indonesian TTS systems, improving naturalness and human-like speech synthesis.
Contribution
It introduces a novel prosody-based emotional filter for Indonesian TTS, enhancing expressiveness and perception of emotions in synthesized speech.
Findings
Perception accuracy for emotions: 95% happy, 96.25% angry, 98.75% sad.
Intelligibility rate of 93.3% for original sentences.
Naturalness perception: 75.6% happy, 73.3% angry, 60% sad.
Abstract
Adding an emotions using prosody manipulation method for Indonesian text to speech system. Text To Speech (TTS) is a system that can convert text in one language into speech, accordance with the reading of the text in the language used. The focus of this research is a natural sounding concept, the make "humanize" for the pronunciation of voice synthesis system Text To Speech. Humans have emotions / intonation that may affect the sound produced. The main requirement for the system used Text To Speech in this research is eSpeak, the database MBROLA using id1, Human Speech Corpus database from a website that summarizes the words with the highest frequency (Most Common Words) used in a country. And there are 3 types of emotional / intonation designed base. There is a happy, angry and sad emotion. Method for develop the emotional filter is manipulate the relevant features of prosody…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEdcuational Technology Systems
