Post-edits Are Preferences Too

Nathaniel Berger; Miriam Exel; Matthias Huck; Stefan Riezler

arXiv:2410.02320·cs.CL·February 24, 2025

Post-edits Are Preferences Too

Nathaniel Berger, Miriam Exel, Matthias Huck, Stefan Riezler

PDF

Open Access 1 Video

TL;DR

This paper investigates using post-edits as a reliable source of human preferences for fine-tuning large language models, showing that pre-training on post-edits improves model alignment with post-edit-like outputs.

Contribution

It demonstrates that post-edits can serve as implicit preferences for preference optimization, enhancing model performance in generating post-edit-like hypotheses.

Findings

01

Post-edits can be used as implicit preferences for fine-tuning.

02

Pre-training on post-edits leads to better alignment with post-edit-like outputs.

03

Post-edits help models move away from machine translation-like hypotheses.

Abstract

Preference Optimization (PO) techniques are currently one of the state of the art techniques for fine-tuning large language models (LLMs) on pairwise preference feedback from human annotators. However, in machine translation, this sort of feedback can be difficult to solicit. Additionally, Kreutzer et al. (2018) have shown that, for machine translation, pairwise preferences are less reliable than other forms of human feedback, such as 5-point ratings. We examine post-edits to see if they can be a source of reliable human preferences by construction. In PO, a human annotator is shown sequences $s_{1}$ and $s_{2}$ and asked for a preference judgment, % $s_{1} > s_{2}$ ; while for post-editing, editors create $s_{1}$ and know that it should be better than $s_{2}$ . We attempt to use these implicit preferences for PO and show that it helps the model move towards post-edit-like hypotheses and away from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Post-edits Are Preferences Too· underline

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Topic Modeling · Sentiment Analysis and Opinion Mining

MethodsParrot optimizer: Algorithm and applications to medical problems