Exploring the Correlation between Human and Machine Evaluation of   Simultaneous Speech Translation

Xiaoman Wang; Claudio Fantinuoli

arXiv:2406.10091·cs.CL·June 17, 2024·1 cites

Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation

Xiaoman Wang, Claudio Fantinuoli

PDF

Open Access

TL;DR

This study evaluates the reliability of automatic metrics, especially GPT models, in assessing the quality of simultaneous speech translation by analyzing their correlation with human judgments of translation faithfulness.

Contribution

It demonstrates that GPT-3.5 with direct prompting correlates strongly with human assessments in evaluating translation accuracy without reference texts.

Findings

01

GPT-3.5 shows high correlation with human judgment

02

Context window size affects evaluation accuracy

03

Semantic similarity metrics are effective for assessment

Abstract

Assessing the performance of interpreting services is a complex task, given the nuanced nature of spoken language translation, the strategies that interpreters apply, and the diverse expectations of users. The complexity of this task become even more pronounced when automated evaluation methods are applied. This is particularly true because interpreted texts exhibit less linearity between the source and target languages due to the strategies employed by the interpreter. This study aims to assess the reliability of automatic metrics in evaluating simultaneous interpretations by analyzing their correlation with human evaluations. We focus on a particular feature of interpretation quality, namely translation accuracy or faithfulness. As a benchmark we use human assessments performed by language experts, and evaluate how well sentence embeddings and Large Language Models correlate with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Residual Connection · Discriminative Fine-Tuning · Softmax · Layer Normalization · Focus