Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ram\'e, Guillaume Couairon, Mustafa Shukor, Corentin, Dancette, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord

TL;DR
This paper introduces rewarded soup, a method that interpolates weights of multiple models fine-tuned on diverse rewards to achieve Pareto-optimal alignment across various tasks and preferences.
Contribution
It proposes a novel multi-policy interpolation approach that maintains linear connectivity of weights fine-tuned on different rewards, enabling better alignment with diverse objectives.
Findings
Effective across text, image, and control tasks
Weights remain linearly connected after diverse reward fine-tuning
Improves alignment by interpolating multiple specialized models
Abstract
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate the issue. This paper proposes embracing the heterogeneity of diverse rewards by following a multi-policy strategy. Rather than focusing on a single a priori reward, we aim for Pareto-optimal generalization across the entire space of preferences. To this end, we propose rewarded soup, first specializing multiple networks independently (one for each proxy reward) and then interpolating their weights linearly. This succeeds empirically because we show that the weights remain linearly connected when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
