LLM Active Alignment: A Nash Equilibrium Perspective

Tonghan Wang; Yuqi Pan; Xinyi Yang; Yanchen Jiang; Milind Tambe; David C. Parkes

arXiv:2602.06836·cs.AI·February 9, 2026

LLM Active Alignment: A Nash Equilibrium Perspective

Tonghan Wang, Yuqi Pan, Xinyi Yang, Yanchen Jiang, Milind Tambe, David C. Parkes

PDF

Open Access

TL;DR

This paper introduces a game-theoretic framework using Nash equilibrium analysis to predict and influence large language model behaviors, providing explicit guidance for socially desirable alignment outcomes.

Contribution

It develops a novel analytical approach for active alignment of LLMs via Nash equilibrium, enabling strategic control over multi-agent LLM populations.

Findings

01

Nash equilibrium characterizations for LLM populations

02

Identification of political exclusion phenomena in reasoning-based models

03

Active alignment can prevent social biases in LLM interactions

Abstract

We develop a game-theoretic framework for predicting and steering the behavior of populations of large language models (LLMs) through Nash equilibrium (NE) analysis. To avoid the intractability of equilibrium computation in open-ended text spaces, we model each agent's action as a mixture over human subpopulations. Agents choose actively and strategically which groups to align with, yielding an interpretable and behaviorally substantive policy class. We derive closed-form NE characterizations, adopting standard concave-utility assumptions to enable analytical system-level predictions and give explicit, actionable guidance for shifting alignment targets toward socially desirable outcomes. The method functions as an active alignment layer on top of existing alignment pipelines such as RLHF. In a social-media setting, we show that a population of LLMs, especially reasoning-based models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Language and cultural evolution · Text Readability and Simplification