MT Metrics Correlate with Human Ratings of Simultaneous Speech   Translation

Dominik Mach\'a\v{c}ek; Ond\v{r}ej Bojar; Raj Dabre

arXiv:2211.08633·cs.CL·June 2, 2023

MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

Dominik Mach\'a\v{c}ek, Ond\v{r}ej Bojar, Raj Dabre

PDF

Open Access 1 Repo

TL;DR

This study analyzes how well offline MT evaluation metrics correlate with human ratings in simultaneous speech translation, finding they are reliable proxies under current quality levels, especially when using translation as a reference.

Contribution

The paper provides an extensive correlation analysis between offline MT metrics and human ratings in SST, demonstrating their reliability and limitations for evaluation.

Findings

01

Offline metrics are well correlated with human ratings in SST.

02

Metrics correlate more strongly with translation as a reference than with interpreting.

03

Metrics can serve as proxies for human evaluation, reducing need for large-scale human ratings.

Abstract

There have been several meta-evaluation studies on the correlation between human ratings and offline machine translation (MT) evaluation metrics such as BLEU, chrF2, BertScore and COMET. These metrics have been used to evaluate simultaneous speech translation (SST) but their correlations with human ratings of SST, which has been recently collected as Continuous Ratings (CR), are unclear. In this paper, we leverage the evaluations of candidate systems submitted to the English-German SST task at IWSLT 2022 and conduct an extensive correlation analysis of CR and the aforementioned metrics. Our study reveals that the offline metrics are well correlated with CR and can be reliably used for evaluating machine translation in simultaneous mode, with some limitations on the test set size. We conclude that given the current quality levels of SST, these metrics can be used as proxies for CR,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ufal/MT-metrics-in-SimST
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification

MethodsTest