Understanding How Value Neurons Shape the Generation of Specified Values in LLMs

Yi Su; Jiayi Zhang; Shu Yang; Xinhai Wang; Lijie Hu; Di Wang

arXiv:2505.17712·cs.CL·May 26, 2025

Understanding How Value Neurons Shape the Generation of Specified Values in LLMs

Yi Su, Jiayi Zhang, Shu Yang, Xinhai Wang, Lijie Hu, Di Wang

PDF

1 Video

TL;DR

This paper introduces ValueLocate, a framework for interpreting how values are encoded in LLM neurons, using a new dataset and causal neuron manipulation to improve understanding of model alignment with ethical principles.

Contribution

It presents a novel mechanistic interpretability method grounded in psychological value theory, enabling precise localization and causal testing of value neurons in LLMs.

Findings

01

Successfully identified value-critical neurons in LLMs.

02

Demonstrated causal influence of neurons on model value orientations.

03

Provided a new dataset linking psychological values to neural activations.

Abstract

Rapid integration of large language models (LLMs) into societal applications has intensified concerns about their alignment with universal ethical principles, as their internal value representations remain opaque despite behavioral alignment advancements. Current approaches struggle to systematically interpret how values are encoded in neural architectures, limited by datasets that prioritize superficial judgments over mechanistic analysis. We introduce ValueLocate, a mechanistic interpretability framework grounded in the Schwartz Values Survey, to address this gap. Our method first constructs ValueInsight, a dataset that operationalizes four dimensions of universal value through behavioral contexts in the real world. Leveraging this dataset, we develop a neuron identification method that calculates activation differences between opposing value aspects, enabling precise localization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding How Value Neurons Shape the Generation of Specified Values in LLMs· underline