TL;DR
This paper presents Adaptive Pluralistic Alignment (APA), a modular pipeline that dynamically updates AI systems to reflect evolving societal values without costly retraining, using a jury-based voting mechanism.
Contribution
Introduces APA, a novel modular pipeline for dynamic AI alignment that updates preferences over time through reward models and social-choice voting, avoiding value lock-in.
Findings
Jury composition and voting rules significantly influence outcomes.
APA effectively tracks evolving societal values.
Code and datasets are publicly available at the provided URL.
Abstract
Prevailing alignment methods target a fixed set of preferences and therefore risk forcing value lock-in as societal norms evolve over time. We introduce Adaptive Pluralistic Alignment (APA), a modular pipeline for updating pluralistically aligned AI systems to track evolving values and avoid value lock-in without repeating costly pretraining or large-scale data collection. APA has three stages: (1) learning compact personalized reward models via low-rank reward basis decomposition, (2) using these models as a jury that collectively selects among candidate outputs through social-choice-theoretic voting, and (3) efficiently adapting the jury over time by fitting new annotator weights over the fixed reward bases as values shift. The resulting system is efficient, explainable, steerable, and modular. We implement a proof-of-concept instantiation using the PRISM multi-user alignment dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
