How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics
Yurong Chen, Yu He, Michael I. Jordan, Fan Yao

TL;DR
This paper explores how different sampling strategies influence large language model alignment, revealing theoretical effects on ranking guarantees and stability in iterative preference optimization.
Contribution
It provides a theoretical analysis of sampling effects and iterative dynamics in preference-based LLM alignment, extending insights to broader methods.
Findings
Proper sampling improves ranking guarantees.
Skewed sampling can cause concentration issues.
Iterative dynamics may oscillate or collapse, affecting stability.
Abstract
Standard methods for aligning large language models with human preferences learn from pairwise comparisons among sampled candidate responses and regularize toward a reference policy. Despite their effectiveness, the effects of sampling and reference choices are poorly understood theoretically. We investigate these effects through Identity Preference Optimization, a widely used preference alignment framework, and show that proper instance-dependent sampling can yield stronger ranking guarantees, while skewed on-policy sampling can induce excessive concentration under structured preferences. We then analyze iterative alignment dynamics in which the learned policy feeds back into future sampling and reference policies, reflecting a common practice of model-generated preference data. We prove that these dynamics can exhibit persistent oscillations or entropy collapse for certain parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Constraint Satisfaction and Optimization
