Talking to Robots: A Practical Examination of Speech Foundation Models for HRI Applications

Theresa Pekarek Rosin; Julia Gachot; Henri-Leon Kordt; Matthias Kerzel; Stefan Wermter

arXiv:2508.17753·cs.RO·August 26, 2025

Talking to Robots: A Practical Examination of Speech Foundation Models for HRI Applications

Theresa Pekarek Rosin, Julia Gachot, Henri-Leon Kordt, Matthias Kerzel, Stefan Wermter

PDF

TL;DR

This paper evaluates four advanced speech recognition systems across diverse challenging conditions relevant to human-robot interaction, revealing significant performance variability and biases that impact trust and safety.

Contribution

It provides a comprehensive analysis of state-of-the-art ASR systems in realistic HRI scenarios, highlighting their limitations and biases beyond standard benchmarks.

Findings

01

Performance varies significantly across conditions

02

Hallucination tendencies differ among systems

03

Biases impact user trust and safety

Abstract

Automatic Speech Recognition (ASR) systems in real-world settings need to handle imperfect audio, often degraded by hardware limitations or environmental noise, while accommodating diverse user groups. In human-robot interaction (HRI), these challenges intersect to create a uniquely challenging recognition environment. We evaluate four state-of-the-art ASR systems on eight publicly available datasets that capture six dimensions of difficulty: domain-specific, accented, noisy, age-variant, impaired, and spontaneous speech. Our analysis demonstrates significant variations in performance, hallucination tendencies, and inherent biases, despite similar scores on standard benchmarks. These limitations have serious implications for HRI, where recognition errors can interfere with task performance, user trust, and safety.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.