Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

Hippolyte Gisserot-Boukhlef; Ricardo Rei; Emmanuel Malherbe; C\'eline Hudelot; Pierre Colombo; Nuno M. Guerreiro

arXiv:2409.20059·cs.CL·November 21, 2025

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

Hippolyte Gisserot-Boukhlef, Ricardo Rei, Emmanuel Malherbe, C\'eline Hudelot, Pierre Colombo, Nuno M. Guerreiro

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper empirically examines the effectiveness of preference-based alignment, specifically Contrastive Preference Optimization, in improving LLM-based translation quality, revealing benefits and stability issues compared to traditional fine-tuning.

Contribution

It provides an extensive empirical evaluation of preference-based alignment techniques like CPO in LLM translation, highlighting their advantages and limitations over supervised fine-tuning.

Findings

01

CPO outperforms SFT on high-quality data for the alignment metric.

02

Preference-based alignment may cause instability across different evaluation metrics.

03

Using the base model alone can match external systems' performance with better consistency.

Abstract

Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics. Researchers have therefore utilized neural metrics through quality-informed decoding strategies, achieving better results than likelihood-based methods. With the rise of Large Language Models (LLMs), preference-based alignment techniques have gained attention for their potential to enhance translation quality by optimizing model weights directly on preferences induced by quality estimators. This study focuses on Contrastive Preference Optimization (CPO) and conducts extensive experiments to evaluate the impact of preference-based alignment on translation quality. Our findings indicate that while CPO consistently outperforms Supervised Fine-Tuning (SFT) on high-quality data with regard to the alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

hgissbkh/WMT22-23-Test-Metrics
dataset· 20 dl
20 dl

Videos

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification

MethodsSoftmax · Attention Is All You Need · Balanced Selection