On the Hallucination in Simultaneous Machine Translation

Meizhi Zhong; Kehai Chen; Zhengshan Xue; Lemao Liu; Mingming Yang; Min; Zhang

arXiv:2406.07239·cs.CL·June 12, 2024

On the Hallucination in Simultaneous Machine Translation

Meizhi Zhong, Kehai Chen, Zhengshan Xue, Lemao Liu, Mingming Yang, Min, Zhang

PDF

Open Access 1 Repo

TL;DR

This paper analyzes hallucination issues in Simultaneous Machine Translation, revealing how target-side information influences hallucination and suggesting that reducing reliance on target context can mitigate the problem.

Contribution

It provides a comprehensive analysis of hallucination in SiMT, focusing on distribution and target-side context, and proposes a method to alleviate hallucination by limiting target information usage.

Findings

01

Hallucination words are influenced by target-side context.

02

Reducing target-side information decreases hallucination.

03

Understanding hallucination distribution aids in developing better SiMT models.

Abstract

It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information. While many efforts have been made to enhance performance for SiMT, few of them attempt to understand and analyze hallucination in SiMT. Therefore, we conduct a comprehensive analysis of hallucination in SiMT from two perspectives: understanding the distribution of hallucination words and the target-side context usage of them. Intensive experiments demonstrate some valuable findings and particularly show that it is possible to alleviate hallucination by decreasing the over usage of target-side information for SiMT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhongmz/SiMT-Hallucination
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Biomedical Text Mining and Ontologies · Algorithms and Data Compression