Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph, Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey,, Alexandre Ram\'e, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr, Ahmed, Aranyak Mehta, L\'eonard Hussenot, Olivier Bachem

TL;DR
This paper introduces Conditional Language Policy (CLP), a flexible framework for finetuning language models to balance multiple conflicting objectives efficiently without needing multiple models.
Contribution
The paper proposes CLP, a novel method that enables steerable multi-objective finetuning of language models, outperforming existing approaches in Pareto efficiency.
Findings
CLP effectively trades off conflicting objectives at inference.
CLP outperforms existing multi-objective finetuning methods.
CLP does not require multiple models for different trade-offs.
Abstract
Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate the existing approaches for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistic research and analysis · Syntax, Semantics, Linguistic Variation · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
