Towards Achieving Human Parity on End-to-end Simultaneous Speech   Translation via LLM Agent

Shanbo Cheng; Zhichao Huang; Tom Ko; Hang Li; Ningxin Peng; Lu Xu,; Qini Zhang

arXiv:2407.21646·cs.CL·September 2, 2024·1 cites

Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent

Shanbo Cheng, Zhichao Huang, Tom Ko, Hang Li, Ningxin Peng, Lu Xu,, Qini Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLASI, a novel high-quality simultaneous speech translation system inspired by human interpreters, which balances translation quality and latency using a data-driven strategy and multi-modal retrieval, outperforming existing systems.

Contribution

The paper presents CLASI, a new approach integrating a read-write strategy and multi-modal retrieval to improve real-time speech translation quality and robustness, especially in challenging scenarios.

Findings

01

CLASI achieves VIP of 81.3% (Chinese-English) and 78.0% (English-Chinese).

02

Outperforms state-of-the-art systems significantly in real-world and hard datasets.

03

Demonstrates robustness in disfluent, informal speech scenarios.

Abstract

In this paper, we present Cross Language Agent -- Simultaneous Interpretation, CLASI, a high-quality and human-like Simultaneous Speech Translation (SiST) System. Inspired by professional human interpreters, we utilize a novel data-driven read-write strategy to balance the translation quality and latency. To address the challenge of translating in-domain terminologies, CLASI employs a multi-modal retrieving module to obtain relevant information to augment the translation. Supported by LLMs, our approach can generate error-tolerated translation by considering the input audio, historical context, and retrieved information. Experimental results show that our system outperforms other systems by significant margins. Aligned with professional human interpreters, we evaluate CLASI with a better human evaluation metric, valid information proportion (VIP), which measures the amount of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

byteresearchcla/realsi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems