Personalized Soups: Personalized Large Language Model Alignment via   Post-hoc Parameter Merging

Joel Jang; Seungone Kim; Bill Yuchen Lin; Yizhong Wang; Jack Hessel,; Luke Zettlemoyer; Hannaneh Hajishirzi; Yejin Choi; Prithviraj Ammanabrolu

arXiv:2310.11564·cs.CL·October 19, 2023·6 cites

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel,, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a method for aligning large language models to individual user preferences by decomposing preferences into multiple dimensions, training them independently, and merging parameters post-hoc for personalized responses.

Contribution

It proposes a novel approach to personalized LLM alignment using multi-objective reinforcement learning and post-hoc parameter merging, improving over traditional single-objective methods.

Findings

01

Personalized alignment outperforms single-objective baselines.

02

Preferences can be decomposed into multiple dimensions for training.

03

Effective post-hoc parameter merging enables personalized responses.

Abstract

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning diverse, individual perspectives. In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strong single-objective baselines, we show that we can achieve personalized alignment by decomposing preferences into multiple dimensions. These dimensions are defined based on personalizations that are declared as desirable by the user. In this work, we show that they can be efficiently trained independently in a distributed manner and combined effectively post-hoc through parameter merging. The code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joeljang/rlphf
pytorchOfficial

Datasets

lFelix/pSoups
dataset· 7 dl
7 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems