Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with   Neural Processes

Katarzyna Kobalczyk; Claudio Fanconi; Hao Sun; Mihaela van der Schaar

arXiv:2412.13998·cs.LG·December 19, 2024

Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes

Katarzyna Kobalczyk, Claudio Fanconi, Hao Sun, Mihaela van der Schaar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel few-shot framework for aligning large language models with diverse user preferences by inferring underlying preferences from minimal data and enabling real-time behavioral adaptation.

Contribution

It extends the Bradley-Terry-Luce model for heterogeneous preferences and proposes a functional parameter-space conditioning method for efficient, personalized LLM alignment.

Findings

01

Effective in capturing diverse human preferences

02

Data-efficient adaptation to individual users

03

Enables real-time behavioral mode switching

Abstract

As large language models (LLMs) become increasingly embedded in everyday applications, ensuring their alignment with the diverse preferences of individual users has become a critical challenge. Currently deployed approaches typically assume homogeneous user objectives and rely on single-objective fine-tuning. However, human preferences are inherently heterogeneous, influenced by various unobservable factors, leading to conflicting signals in preference data. Existing solutions addressing this diversity often require costly datasets labelled for specific objectives and involve training multiple reward models or LLM policies, which is computationally expensive and impractical. In this work, we present a novel framework for few-shot steerable alignment, where users' underlying preferences are inferred from a small sample of their choices. To achieve this, we extend the Bradley-Terry-Luce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kasia-kobalczyk/few-shot-steerable-alignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsALIGN