Cogs in a Machine, Doing What They're Meant to Do -- The AMI Submission to the WMT24 General Translation Task
Atli Jasonarson, Hinrik Hafsteinsson, Bjarki \'Armannsson,, Stein{\th}\'or Steingr\'imsson

TL;DR
This paper details the AMI team's submission to the WMT24 translation task, focusing on English-Icelandic translation using multiple models, curated datasets, and synthetic data generated by LLMs to enhance translation quality.
Contribution
The paper introduces a multi-model translation system with carefully curated datasets and synthetic data augmentation using LLMs for improved English-Icelandic translation.
Findings
Synthetic data from LLMs significantly improves translation quality.
Careful dataset filtering enhances system performance.
Multiple models and grammar correction contribute to better output.
Abstract
This paper presents the submission of the \'Arni Magnusson Institute's team to the WMT24 General translation task. We work on the English->Icelandic translation direction. Our system comprises four translation models and a grammar correction model. For training our models we carefully curate our datasets, aggressively filtering out sentence pairs that may detrimentally affect the quality of our system's output. Some of our data are collected from human translations and some are synthetically generated. A part of the synthetic data is generated using an LLM, and we find that it increases the translation capability of our system significantly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
