Do LLMs have Consistent Values?

Naama Rozen; Liat Bezalel; Gal Elidan; Amir Globerson; Ella Daniel

arXiv:2407.12878·cs.CL·October 16, 2024

Do LLMs have Consistent Values?

Naama Rozen, Liat Bezalel, Gal Elidan, Amir Globerson, Ella Daniel

PDF

Open Access 3 Reviews

TL;DR

This paper investigates whether large language models exhibit human-like value structures, finding that with specific prompting strategies, LLMs can demonstrate value patterns similar to humans, advancing understanding of AI moral alignment.

Contribution

The study introduces a novel prompting method called 'Value Anchoring' that reveals human-like value structures in LLMs, enhancing assessment techniques for AI consistency.

Findings

01

Value structures in LLMs depend on prompting strategy.

02

'Value Anchoring' prompts lead to human-like value agreement.

03

The approach offers new methods for evaluating LLM moral consistency.

Abstract

Large Language Models (LLM) technology is constantly improving towards human-like dialogue. Values are a basic driving force underlying human behavior, but little research has been done to study the values exhibited in text generated by LLMs. Here we study this question by turning to the rich literature on value structure in psychology. We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values. We show that the results of this analysis depend on how the LLM is prompted, and that under a particular prompting strategy (referred to as "Value Anchoring") the agreement with human data is quite compelling. Our results serve both to improve our understanding of values in LLMs, as well as introduce novel methods for assessing consistency in LLM responses.

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 3

Strengths

This paper draws on concepts and survey materials from psychology literature. So, this paper stands on a secure theoretical foundation around how human values are defined and conceptualized.

Weaknesses

The statement “little research has been done to study the values exhibited in text generated by LLMs” in the abstract (and echoed repeatedly in the introduction) overly downplays the amount of attention this area of research has received in the past five years. That is, it really seems to ignore all of the research that was around even in the era of BERT family models. Some prominent examples from the past five years: Social Chemistry 101 by Forbes et al. in 2020, Argyle et al. 2022’s work on si

Reviewer 02Rating 3Confidence 4

Strengths

The proposed study is a novel fusion of LLM behaviors and value psychology. The use of Value Anchor may bring out more human-like behaviors in LLMs, which is an interesting finding. The authors also demonstrate via different prompts that LLMs can consistently mirror psychological value traits of a certain population of humans. The presentation uses clear figures and concise writing. The background knowledge on value measurements is sufficiently introduced, making the paper easy to follow.

Weaknesses

The study of whether LLMs share the same value structure as humans do is interesting, but the practical uses and the influences on how to build better LLMs remain a little unclear. It might be more interesting to shed some light on how the results could help improve LLM behaviors. In addition, the groundtruth human responses for comparisons may exhibit certain biases. As written in line 245-247, the mean age of participants was 34.2 with 59% females. Does it cover a fuller spectrum of human su

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper is well-written and easy to follow. 2. The work is well-grounded in established psychological theory, particularly Schwartz's Theory of Basic Human Values. 3. The use of value rankings and correlations provides concrete, measurable ways to compare LLM outputs to human data. 4. The paper studies a timely and important issue.

Weaknesses

1. The experimental results could be made stronger by analyzing whether minor variations in the same prompt could elicit the same results. Since language models often respond differently to small changes in wording, showing how the results hold up with different prompts would add a lot of value. A bit more discussion around this could help understand how stable the findings really are. 2. It would be great to see the value rankings and correlation structures explored in generation tasks as well

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCooperative Studies and Economics