FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users

Anikait Singh; Sheryl Hsu; Kyle Hsu; Eric Mitchell; Stefano Ermon; Tatsunori Hashimoto; Archit Sharma; and Chelsea Finn

arXiv:2502.19312·cs.LG·April 20, 2026

FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users

Anikait Singh, Sheryl Hsu, Kyle Hsu, Eric Mitchell, Stefano Ermon, Tatsunori Hashimoto, Archit Sharma, and Chelsea Finn

PDF

1 Repo

TL;DR

FSPO is a few-shot learning algorithm that personalizes large language models by quickly inferring user-specific reward functions from minimal preferences, using synthetic data for training.

Contribution

The paper introduces FSPO, a novel meta-learning approach for LLM personalization that leverages synthetic preference data and a new user description rationalization technique.

Findings

01

FSPO achieves 87% winrate in synthetic user personalization.

02

FSPO attains 70% winrate with real users in open-ended QA.

03

Synthetic data with high diversity and coherence is crucial for transfer.

Abstract

Effective personalization of LLMs is critical for a broad range of user-interfacing applications such as virtual assistants and content curation. Inspired by the strong in-context capabilities of LLMs, we propose few-shot preference optimization (FSPO), an algorithm for LLM personalization that reframes reward modeling as a meta-learning problem. Under FSPO, an LLM learns to quickly infer a personalized reward function for a user via a few labeled preferences. FSPO also utilizes user description rationalization (RAT) to encourage better reward modeling and instruction following, recovering performance with the oracle user description. Since real-world preference data is challenging to collect at scale, we propose careful design choices to construct synthetic preference datasets for personalization, generating over 1M synthetic personalized preferences using publicly available LLMs. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asap7772/fewshot-preference-optimization
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.