Language Model Alignment with Elastic Reset

Michael Noukhovitch; Samuel Lavoie; Florian Strub; Aaron Courville

arXiv:2312.07551·cs.CL·December 14, 2023·1 cites

Language Model Alignment with Elastic Reset

Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Elastic Reset, a novel algorithm for fine-tuning language models that improves reward achievement while reducing drift, without explicit objective modifications, demonstrated through various benchmarks.

Contribution

Elastic Reset is a new method that periodically resets the model and its EMA to enhance reward and reduce drift without changing the training objective.

Findings

01

Achieves higher reward with less drift compared to standard methods.

02

Outperforms baselines on pivot-translation and sentiment tasks.

03

Produces a more aligned and effective QA chatbot.

Abstract

Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how different algorithms tradeoff between reward and drift. The standard method modified the reward with a Kullback-Lieber (KL) penalty between the online and initial model. We propose Elastic Reset, a new algorithm that achieves higher reward with less drift without explicitly modifying the training objective. We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model. Through the use of an EMA, our model recovers quickly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mnoukhov/elastic-reset
pytorchOfficial

Videos

Language Model Alignment with Elastic Reset· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems