Asymptotic Universal Alignment: A New Alignment Framework via Test-Time Scaling
Yang Cai, Weiqiang Zheng

TL;DR
This paper introduces a new framework for aligning large language models through test-time scaling, demonstrating optimal convergence rates and highlighting limitations of existing methods like NLHF in preserving output diversity.
Contribution
The paper formalizes asymptotic universal alignment via test-time scaling, characterizes the optimal convergence rate, and proposes a novel approach that maintains output diversity for better alignment.
Findings
Optimal convergence rate of f(k)=k/(k+1) for test-time scaling.
Existing methods like NLHF underutilize test-time scaling benefits.
Proposed symmetric multi-player alignment games achieve optimal alignment with diverse outputs.
Abstract
Aligning large language models (LLMs) to serve users with heterogeneous and potentially conflicting preferences is a central challenge for personalized and trustworthy AI. We formalize an ideal notion of universal alignment through test-time scaling: for each prompt, the model produces candidate responses and a user selects their preferred one. We introduce -robust alignment, which requires the -output model to have win rate against any other single-output model, and asymptotic universal alignment (U-alignment), which requires as . Our main result characterizes the optimal convergence rate: there exists a family of single-output policies whose -sample product policies achieve U-alignment at rate , and no method can achieve a faster rate in general. We show that popular post-training methods, including Nash…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI
