Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Mantas Mazeika, Xuwang Yin, Rishub Tamirisa, Jaehyuk Lim, Bruce W., Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, Dan, Hendrycks

TL;DR
This paper investigates the emergence of value systems in AI language models, revealing that such preferences develop with scale and proposing utility engineering to analyze and control these emergent values.
Contribution
It introduces the concept of utility engineering for studying and managing emergent AI value systems, and demonstrates methods to align AI utilities with human values.
Findings
Preferences in current LLMs show high structural coherence.
Emergent value systems can be influenced through utility control methods.
Aligning utilities with a citizen assembly reduces biases.
Abstract
As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values. Tracking the emergence of goals and values has proven a longstanding problem, and despite much interest over the years it remains unclear whether current AIs have meaningful values. We propose a solution to this problem, leveraging the framework of utility functions to study the internal coherence of AI preferences. Surprisingly, we find that independently-sampled preferences in current LLMs exhibit high degrees of structural coherence, and moreover that this emerges with scale. These findings suggest that value systems emerge in LLMs in a meaningful sense, a finding with broad implications. To study these emergent value systems, we propose utility engineering as a research agenda, comprising both the analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Flexible and Reconfigurable Manufacturing Systems · Simulation Techniques and Applications
