Assessing Evaluation Metrics for Speech-to-Speech Translation
Elizabeth Salesky, Julian M\"ader, Severin Klinger

TL;DR
This paper evaluates the effectiveness of automatic metrics for speech-to-speech translation, highlighting their limitations for low-resource and dialectal languages, and explores how translation targets affect evaluation accuracy.
Contribution
It systematically assesses current evaluation metrics for speech-to-speech translation and examines their performance across different language resources and dialectal variants.
Findings
Existing metrics perform well only for high-resource standardized languages.
Translation to dialects impacts the reliability of evaluation metrics.
Evaluation methods need adaptation for low-resource and dialectal language translation.
Abstract
Speech-to-speech translation combines machine translation with speech synthesis, introducing evaluation challenges not present in either task alone. How to automatically evaluate speech-to-speech translation is an open question which has not previously been explored. Translating to speech rather than to text is often motivated by unwritten languages or languages without standardized orthographies. However, we show that the previously used automatic metric for this task is best equipped for standardized high-resource languages only. In this work, we first evaluate current metrics for speech-to-speech translation, and second assess how translation to dialectal variants rather than to standardized languages impacts various evaluation methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
