Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model
Nathan Kallus

TL;DR
This paper introduces a semiparametric single-index model for preference alignment in large language models, addressing unknown link functions and developing robust algorithms with convergence guarantees, demonstrated empirically.
Contribution
It proposes a novel semiparametric model for preference alignment that handles unknown link functions and develops robust algorithms with theoretical guarantees.
Findings
Algorithms converge with guarantees under generic function complexity.
Empirical results demonstrate effectiveness on LLM alignment.
Model handles unidentifiable nonparametric indices effectively.
Abstract
Aligning large language models (LLMs) to preference data typically assumes a known link function between observed preferences and latent rewards (e.g., a logistic Bradley-Terry link). Misspecification of this link can bias inferred rewards and misalign learned policies. We study preference alignment under an unknown and unrestricted link function. We show that realizability of -divergence-constrained reward maximization in a policy class induces a semiparametric single-index binary choice model, where a scalar policy-dependent index captures all dependence on demonstrations and the remaining preference distribution is unrestricted. Rather than assuming this model has identifiable finite-dimensional structural parameters and estimating them, as in econometrics, we focus on policy learning with the reward function implicit, analyzing error to the optimal policy and allowing for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Voting Systems · Machine Learning and Data Classification · Recommender Systems and Techniques
