SpeechQE: Estimating the Quality of Direct Speech Translation

HyoJung Han; Kevin Duh; Marine Carpuat

arXiv:2410.21485·cs.CL·October 30, 2024

SpeechQE: Estimating the Quality of Direct Speech Translation

HyoJung Han, Kevin Duh, Marine Carpuat

PDF

Open Access 1 Repo 2 Models 1 Datasets 1 Video

TL;DR

This paper introduces SpeechQE, a new benchmark and system for estimating the quality of direct speech translation, highlighting the advantages of end-to-end models over cascaded approaches.

Contribution

It formulates the SpeechQE task, creates a benchmark, and evaluates novel end-to-end systems using pre-trained text LLMs for speech translation quality estimation.

Findings

01

End-to-end models outperform cascaded systems in quality estimation.

02

Pre-trained text LLMs enhance end-to-end speech translation quality estimation.

03

The paper releases data and models to foster further research.

Abstract

Recent advances in automatic quality estimation for machine translation have exclusively focused on written language, leaving the speech modality underexplored. In this work, we formulate the task of quality estimation for speech translation (SpeechQE), construct a benchmark, and evaluate a family of systems based on cascaded and end-to-end architectures. In this process, we introduce a novel end-to-end system leveraging pre-trained text LLM. Results suggest that end-to-end approaches are better suited to estimating the quality of direct speech translation than using quality estimation systems designed for text in cascaded systems. More broadly, we argue that quality estimation of speech translation needs to be studied as a separate problem from that of text, and release our data and models to guide further research in this space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

h-j-han/speechqe
pytorchOfficial

Models

Datasets

h-j-han/SpeechQE-CoVoST2
dataset· 116 dl
116 dl

Videos

SpeechQE: Estimating the Quality of Direct Speech Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques