Optimal Corpus Aware Training for Neural Machine Translation

Yi-Hsiu Liao; Cheng Shen; Brenda (Zixiaofan) Yang

arXiv:2508.05364·cs.LG·August 8, 2025

Optimal Corpus Aware Training for Neural Machine Translation

Yi-Hsiu Liao, Cheng Shen, Brenda (Zixiaofan) Yang

PDF

TL;DR

This paper introduces Optimal Corpus Aware Training (OCAT), a lightweight fine-tuning method that improves neural machine translation by effectively leveraging corpus metadata, resulting in higher accuracy and robustness.

Contribution

OCAT fine-tunes pre-trained models by adjusting only a small set of corpus-related parameters, enhancing translation quality with less risk of overfitting and hyperparameter sensitivity.

Findings

01

+3.6 chrF improvement on WMT23 English-Chinese translation

02

+1.8 chrF improvement on English-German translation

03

Comparable or better performance than state-of-the-art fine-tuning methods

Abstract

Corpus Aware Training (CAT) leverages valuable corpus metadata during training by injecting corpus information into each training example, and has been found effective in the literature, commonly known as the "tagging" approach. Models trained with CAT inherently learn the quality, domain and nuance between corpora directly from data, and can easily switch to different inference behavior. To achieve the best evaluation, CAT models pre-define a group of high quality data before training starts which can be error-prone and inefficient. In this work, we propose Optimal Corpus Aware Training (OCAT), which fine-tunes a CAT pre-trained model by freezing most of the model parameters and only tuning small set of corpus-related parameters. We show that OCAT is lightweight, resilient to overfitting, and effective in boosting model accuracy. We use WMT23 English to Chinese and English to German…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.