Orchestrating LLMs with Different Personalizations
Jin Peng Zhou, Katie Z Luo, Jingwen Gu, Jason Yuan, Kilian Q., Weinberger, Wen Sun

TL;DR
This paper introduces a black-box method to personalize large language models by merging outputs from specialized expert models based on user preferences, avoiding re-training.
Contribution
It proposes a Preference Control Model that dynamically combines expert model outputs at the token level to align LLMs with individual preferences without re-training.
Findings
Matches or surpasses existing preference merging techniques
Provides a scalable, efficient alternative to fine-tuning
Effective in aligning LLM outputs with multiple human preferences
Abstract
This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. Starting from specialized expert LLMs, each trained for one such particular preference dimension, we propose a black-box method that merges their outputs on a per-token level. We train a lightweight Preference Control Model (PCM) that dynamically translates the preference description and current context into next-token prediction weights. By combining the expert models' outputs at the token level, our approach dynamically generates text that optimizes the given preference. Empirical tests show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Digital Rights Management and Security
