I Know You're Listening: Adaptive Voice for HRI
Paige Tutt\"os\'i

TL;DR
This paper introduces an expressive, adaptable, and clarity-optimized voice synthesis system for language teaching robots, enhancing expressivity, environmental responsiveness, and intelligibility for L2 learners.
Contribution
It presents a lightweight expressive TTS system, environmental adaptation techniques, and an L2 clarity mode tailored for language teaching robots, addressing key gaps in task-specific robot voices.
Findings
The expressive voice is more socially appropriate and suitable for storytelling.
Environmental adjustments improve perceived appropriateness and awareness.
The L2 clarity mode reduces transcription errors and improves intelligibility.
Abstract
While the use of social robots for language teaching has been explored, there remains limited work on a task-specific synthesized voices for language teaching robots. Given that language is a verbal task, this gap may have severe consequences for the effectiveness of robots for language teaching tasks. We address this lack of L2 teaching robot voices through three contributions: 1. We address the need for a lightweight and expressive robot voice. Using a fine-tuned version of Matcha-TTS, we use emoji prompting to create an expressive voice that shows a range of expressivity over time. The voice can run in real time with limited compute resources. Through case studies, we found this voice more expressive, socially appropriate, and suitable for long periods of expressive speech, such as storytelling. 2. We explore how to adapt a robot's voice to physical and social ambient environments to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
