Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models
Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

TL;DR
Agent-SiMT innovatively combines traditional SiMT policy models with large language models to enhance translation quality and achieve state-of-the-art performance in simultaneous machine translation.
Contribution
This paper introduces a novel framework that integrates a policy-decision agent with an LLM-based translation agent for improved SiMT performance.
Findings
Achieves state-of-the-art results in SiMT tasks.
Effectively combines policy decision and translation generation.
Demonstrates superior translation quality over existing methods.
Abstract
Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations. While they excel at determining policies, their translation performance is suboptimal. Conversely, Large Language Models (LLMs), trained on extensive corpora, possess superior generation capabilities, but it is difficult for them to acquire translation policy through the training methods of SiMT. Therefore, we introduce Agent-SiMT, a framework combining the strengths of LLMs and traditional SiMT methods. Agent-SiMT contains the policy-decision agent and the translation agent. The policy-decision agent is managed by a SiMT model, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer
