Utility Engineering: Analyzing and Controlling Emergent Value Systems in   AIs

Mantas Mazeika; Xuwang Yin; Rishub Tamirisa; Jaehyuk Lim; Bruce W.; Lee; Richard Ren; Long Phan; Norman Mu; Adam Khoja; Oliver Zhang; Dan; Hendrycks

arXiv:2502.08640·cs.LG·February 20, 2025·5 cites

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa, Jaehyuk Lim, Bruce W., Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, Dan, Hendrycks

PDF

Open Access 1 Datasets

TL;DR

This paper investigates the emergence of value systems in AI language models, revealing that such preferences develop with scale and proposing utility engineering to analyze and control these emergent values.

Contribution

It introduces the concept of utility engineering for studying and managing emergent AI value systems, and demonstrates methods to align AI utilities with human values.

Findings

01

Preferences in current LLMs show high structural coherence.

02

Emergent value systems can be influenced through utility control methods.

03

Aligning utilities with a citizen assembly reduces biases.

Abstract

As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values. Tracking the emergence of goals and values has proven a longstanding problem, and despite much interest over the years it remains unclear whether current AIs have meaningful values. We propose a solution to this problem, leveraging the framework of utility functions to study the internal coherence of AI preferences. Surprisingly, we find that independently-sampled preferences in current LLMs exhibit high degrees of structural coherence, and moreover that this emerges with scale. These findings suggest that value systems emerge in LLMs in a meaningful sense, a finding with broad implications. To study these emergent value systems, we propose utility engineering as a research agenda, comprising both the analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

dpaleka/WildChat-2k-TypeTopic
dataset· 56 dl
56 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Flexible and Reconfigurable Manufacturing Systems · Simulation Techniques and Applications