Common-agency Games for Multi-Objective Test-Time Alignment
Baiting Chen, Tong Zhu, Rui Yu, and Xiaowu Dai

TL;DR
CAGE introduces a game-theoretic, training-free framework for multi-objective alignment of large language models, enabling flexible trade-offs and outperforming existing methods without retraining.
Contribution
It proposes a novel equilibrium-based approach for test-time multi-objective alignment, with theoretical guarantees and practical advantages over prior methods.
Findings
Enables fine-grained trade-offs across objectives at inference time.
Outperforms existing test-time alignment methods in empirical evaluations.
Supports weak-to-strong generalization, suitable for resource-constrained settings.
Abstract
Aligning large language models (LLMs) with human preferences is inherently multi-objective: different users and evaluation criteria impose heterogeneous and often conflicting requirements on model outputs. We propose CAGE (Common-Agency Games for Alignment), a training-free, game-theoretic framework for multi-objective test-time alignment. CAGE models alignment objectives as strategic principals that allocate token-level incentives to a shared LLM, inducing an equilibrium policy that captures the joint effect of competing objectives. We develop an efficient algorithm based on equilibrium problems with equilibrium constraints (EPEC) to compute this equilibrium, and establish theoretical guarantees including existence and uniqueness of the equilibrium policy, convergence and stability of the algorithm, and no-regret learning dynamics. Empirically, CAGE enables flexible and fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
