Fine-tuning Large Language Models for Adaptive Machine Translation
Yasmin Moslem, Rejwanul Haque, Andy Way

TL;DR
This paper demonstrates that fine-tuning the Mistral 7B large language model with a small dataset improves its real-time adaptive machine translation capabilities, surpassing baseline models and comparable to commercial LLMs in quality.
Contribution
The study introduces a fine-tuning approach for Mistral 7B that enhances its zero-shot and one-shot translation performance in the medical domain, with minimal data and without extensive task-specific training.
Findings
Fine-tuned Mistral 7B outperforms baseline in zero-shot translation.
The model surpasses ChatGPT in zero-shot translation quality.
Adaptive gains are comparable to commercial LLMs like ChatGPT.
Abstract
This paper presents the outcomes of fine-tuning Mistral 7B, a general-purpose large language model (LLM), for adaptive machine translation (MT). The fine-tuning process involves utilising a combination of zero-shot and one-shot translation prompts within the medical domain. The primary objective is to enhance real-time adaptive MT capabilities of Mistral 7B, enabling it to adapt translations to the required domain at inference time. The results, particularly for Spanish-to-English MT, showcase the efficacy of the fine-tuned model, demonstrating quality improvements in both zero-shot and one-shot translation scenarios, surpassing Mistral 7B's baseline performance. Notably, the fine-tuned Mistral outperforms ChatGPT "gpt-3.5-turbo" in zero-shot translation while achieving comparable one-shot translation quality. Moreover, the zero-shot translation of the fine-tuned Mistral matches NLLB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
