Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation
Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Yang Feng

TL;DR
This paper introduces a novel framework where large language models act as policy-makers in simultaneous generation tasks, optimizing output timing to balance latency and quality, and achieving state-of-the-art results in translation and speech recognition.
Contribution
It proposes the LSG framework enabling off-the-shelf LLMs to decide generation timing, a capability not effectively explored in prior methods.
Findings
Achieves state-of-the-art performance in simultaneous translation
Demonstrates practicality in streaming automatic speech recognition
Utilizes open-source LLMs effectively in real-world scenarios
Abstract
Simultaneous generation models write generation results while reading streaming inputs, necessitating a policy-maker to determine the appropriate output timing. Existing simultaneous generation methods generally adopt the traditional encoder-decoder architecture and learn the generation and policy-making capabilities through complex dynamic programming techniques. Although LLMs excel at text generation, they face challenges in taking on the role of policy-makers through traditional training methods, limiting their exploration in simultaneous generation. To overcome these limitations, we propose a novel LLM-driven Simultaneous Generation (LSG) framework, which allows the off-the-shelf LLM to decide the generation timing and produce output concurrently. Specifically, LSG selects the generation policy that minimizes latency as the baseline policy. Referring to the baseline policy, LSG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
MethodsADaptive gradient method with the OPTimal convergence rate
