On Instruction-Finetuning Neural Machine Translation Models
Vikas Raunak, Roman Grundkiewicz, Marcin Junczys-Dowmunt

TL;DR
This paper introduces instruction finetuning for neural machine translation models, enabling them to follow multiple instructions and perform diverse translation tasks efficiently, similar to large language models.
Contribution
It presents a novel instruction finetuning method for NMT models that allows multi-task, multi-modal, and zero-shot instruction following capabilities.
Findings
NMT models can follow multiple instructions simultaneously.
Instruction finetuning enables diverse translation tasks to be handled jointly.
Performance is comparable to large language models like GPT-3.5-Turbo.
Abstract
In this work, we introduce instruction finetuning for Neural Machine Translation (NMT) models, which distills instruction following capabilities from Large Language Models (LLMs) into orders-of-magnitude smaller NMT models. Our instruction-finetuning recipe for NMT models enables customization of translations for a limited but disparate set of translation-specific tasks. We show that NMT models are capable of following multiple instructions simultaneously and demonstrate capabilities of zero-shot composition of instructions. We also show that through instruction finetuning, traditionally disparate tasks such as formality-controlled machine translation, multi-domain adaptation as well as multi-modal translations can be tackled jointly by a single instruction finetuned NMT model, at a performance level comparable to LLMs such as GPT-3.5-Turbo. To the best of our knowledge, our work is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Sparse Evolutionary Training · Linear Layer · Residual Connection · Weight Decay · Cosine Annealing · Dropout · Byte Pair Encoding
