Full-Sentence Models Perform Better in Simultaneous Translation Using the Information Enhanced Decoding Strategy
Zhengxin Yang

TL;DR
This paper introduces a full-sentence model framework with an information-enhanced decoding strategy for simultaneous translation, reducing resource costs and improving translation quality across multiple language directions.
Contribution
It proposes a novel full-sentence decoding framework that enables arbitrary latency, reduces computational costs, and enhances information encoding for simultaneous translation.
Findings
Outperforms baseline models in translation quality on four language directions.
Reduces computational resource requirements compared to prefix-to-prefix models.
Achieves flexible latency control with a single model.
Abstract
Simultaneous translation, which starts translating each sentence after receiving only a few words in source sentence, has a vital role in many scenarios. Although the previous prefix-to-prefix framework is considered suitable for simultaneous translation and achieves good performance, it still has two inevitable drawbacks: the high computational resource costs caused by the need to train a separate model for each latency and the insufficient ability to encode information because each target token can only attend to a specific source prefix. We propose a novel framework that adopts a simple but effective decoding strategy which is designed for full-sentence models. Within this framework, training a single full-sentence model can achieve arbitrary given latency and save computational resources. Besides, with the competence of the full-sentence model to encode the whole sentence, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
