Decoding-Time Language Model Alignment with Multiple Objectives

Ruizhe Shi; Yifang Chen; Yushi Hu; Alisa Liu; Hannaneh Hajishirzi,; Noah A. Smith; Simon S. Du

arXiv:2406.18853·cs.LG·October 29, 2024·1 cites

Decoding-Time Language Model Alignment with Multiple Objectives

Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi,, Noah A. Smith, Simon S. Du

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces multi-objective decoding (MOD), a novel decoding algorithm that combines predictions from multiple models to optimize for various objectives simultaneously, improving alignment and performance of language models.

Contribution

The paper presents a closed-form solution for multi-objective decoding based on $f$-divergence regularization, with theoretical guarantees and empirical validation showing significant improvements.

Findings

01

MOD achieves 12.8% reward improvement over baseline.

02

MOD reduces toxicity to nearly 0% on Toxigen.

03

MOD improves multiple metrics by 7.9--33.3%."

Abstract

Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $multi-objective decoding (MOD)$ , a decoding-time algorithm that outputs the next token from a linear combination of predictions of all base models, for any given weightings over different objectives. We exploit a common form among a family of $f$ -divergence regularized alignment approaches (such as PPO, DPO, and their variants) to identify a closed-form solution by Legendre transform, and derive an efficient decoding strategy. Theoretically, we show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method. Empirical results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

srzer/mod
pytorchOfficial

Videos

Decoding-Time Language Model Alignment with Multiple Objectives· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsDirect Preference Optimization · Balanced Selection · Entropy Regularization · Focus · Proximal Policy Optimization