EmojiVoice: Towards long-term controllable expressivity in robot speech

Paige Tutt\"os\'i; Shivam Mehta; Zachary Syvenky; Bermet Burkanova; Gustav Eje Henter; Angelica Lim

arXiv:2506.15085·cs.RO·July 31, 2025

EmojiVoice: Towards long-term controllable expressivity in robot speech

Paige Tutt\"os\'i, Shivam Mehta, Zachary Syvenky, Bermet Burkanova, Gustav Eje Henter, Angelica Lim

PDF

Open Access 1 Repo

TL;DR

EmojiVoice is a customizable TTS toolkit enabling social robots to produce long-term, expressive speech with fine-grained control, demonstrated through three diverse case studies showing improved expressivity in storytelling.

Contribution

The paper introduces EmojiVoice, a novel TTS toolkit with emoji-prompting for controllable expressivity in robot speech, suitable for offline deployment and real-time use.

Findings

01

Emoji prompting enhances long-term speech expressivity in storytelling.

02

Expressive voice was less preferred in robot assistant scenarios.

03

Real-time speech generation is feasible with lightweight Matcha-TTS backbone.

Abstract

Humans vary their expressivity when speaking for extended periods to maintain engagement with their listener. Although social robots tend to be deployed with ``expressive'' joyful voices, they lack this long-term variation found in human speech. Foundation model text-to-speech systems are beginning to mimic the expressivity in human speech, but they are difficult to deploy offline on robots. We present EmojiVoice, a free, customizable text-to-speech (TTS) toolkit that allows social roboticists to build temporally variable, expressive speech on social robots. We introduce emoji-prompting to allow fine-grained control of expressivity on a phase level and use the lightweight Matcha-TTS backbone to generate speech in real-time. We explore three case studies: (1) a scripted conversation with a robot assistant, (2) a storytelling robot, and (3) an autonomous speech-to-speech interactive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rosielab/emojivoice
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Communication and Language · Social Robot Interaction and HRI · Speech and dialogue systems