Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation
Matthew Raffel, Victor Agostinelli, Lizhong Chen

TL;DR
This paper introduces SimulMask, a novel attention masking approach for fine-tuning large language models in simultaneous translation, achieving better translation quality and efficiency than existing prompting-based methods.
Contribution
The paper presents SimulMask, a new paradigm that models simultaneous translation through attention masking during fine-tuning, overcoming limitations of prompting optimization strategies.
Findings
Significant translation quality improvement over prompting methods.
Reduced computational cost in fine-tuning process.
Effective across five language pairs on IWSLT 2017 dataset.
Abstract
Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on prompting optimization strategies using either data augmentation or prompt structure modifications. However, these methods suffer from several issues, such as unnecessarily expanded training sets, computational inefficiency from dumping the key and value cache, increased prompt sizes, or restriction to a single decision policy. To eliminate these issues, in this work, we propose SimulMask, a new paradigm for fine-tuning LLMs for simultaneous translation. It utilizes a novel attention mask approach that models simultaneous translation during fine-tuning by masking attention for a desired decision policy. Applying the proposed SimulMask on a Falcon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFuel Cells and Related Materials
MethodsFocus
