Rewarded soups: towards Pareto-optimal alignment by interpolating   weights fine-tuned on diverse rewards

Alexandre Ram\'e; Guillaume Couairon; Mustafa Shukor; Corentin; Dancette; Jean-Baptiste Gaya; Laure Soulier; Matthieu Cord

arXiv:2306.04488·cs.LG·October 17, 2023·6 cites

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Alexandre Ram\'e, Guillaume Couairon, Mustafa Shukor, Corentin, Dancette, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces rewarded soup, a method that interpolates weights of multiple models fine-tuned on diverse rewards to achieve Pareto-optimal alignment across various tasks and preferences.

Contribution

It proposes a novel multi-policy interpolation approach that maintains linear connectivity of weights fine-tuned on different rewards, enabling better alignment with diverse objectives.

Findings

01

Effective across text, image, and control tasks

02

Weights remain linearly connected after diverse reward fine-tuning

03

Improves alignment by interpolating multiple specialized models

Abstract

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate the issue. This paper proposes embracing the heterogeneity of diverse rewards by following a multi-policy strategy. Rather than focusing on a single a priori reward, we aim for Pareto-optimal generalization across the entire space of preferences. To this end, we propose rewarded soup, first specializing multiple networks independently (one for each proxy reward) and then interpolating their weights linearly. This succeeds empirically because we show that the weights remain linearly connected when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexrame/rewardedsoups
pytorchOfficial

Models

🤗
Samzy17/gpt2-imdb-movie-reviews-negative
model

Videos

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques