How to Evaluate Speech Translation with Source-Aware Neural MT Metrics

Mauro Cettolo; Marco Gaido; Matteo Negri; Sara Papi; Luisa Bentivogli

arXiv:2511.03295·cs.CL·April 9, 2026

How to Evaluate Speech Translation with Source-Aware Neural MT Metrics

Mauro Cettolo, Marco Gaido, Matteo Negri, Sara Papi, Luisa Bentivogli

PDF

TL;DR

This paper investigates source-aware neural metrics for speech translation evaluation, focusing on generating reliable textual proxies from audio without transcripts, and introduces a novel re-segmentation algorithm to improve metric robustness.

Contribution

It systematically studies source-aware metrics for speech translation, compares ASR transcripts and back-translations, and proposes a new re-segmentation method for better evaluation accuracy.

Findings

01

ASR transcripts are more reliable than back-translations when WER < 20%

02

Back-translations are computationally cheaper and still effective

03

The re-segmentation algorithm improves robustness of source-aware metrics

Abstract

Automatic evaluation of ST systems is typically performed by comparing translation hypotheses with one or more reference translations. While effective to some extent, this approach inherits the limitation of reference-based evaluation that ignores valuable information from the source input. In MT, recent progress has shown that neural metrics incorporating the source text achieve stronger correlation with human judgments. Extending this idea to ST, however, is not trivial because the source is audio rather than text, and reliable transcripts or alignments between source and references are often unavailable. In this work, we conduct the first systematic study of source-aware metrics for ST, with a particular focus on real-world operating conditions where source transcripts are not available. We explore two complementary strategies for generating textual proxies of the input audio, ASR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.