Heterogeneous Value Alignment Evaluation for Large Language Models

Zhaowei Zhang; Ceyao Zhang; Nian Liu; Siyuan Qi; Ziqi Rong; Song-Chun; Zhu; Shuguang Cui; Yaodong Yang

arXiv:2305.17147·cs.CL·January 12, 2024·2 cites

Heterogeneous Value Alignment Evaluation for Large Language Models

Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun, Zhu, Shuguang Cui, Yaodong Yang

PDF

Open Access 2 Repos 4 Reviews

TL;DR

This paper introduces a novel evaluation system for assessing how well large language models align with diverse human values, emphasizing the importance of transferring heterogeneous values in practical applications.

Contribution

It proposes the Heterogeneous Value Alignment Evaluation (HVAE) system that incorporates social value orientation to measure LLMs' ability to pursue and align with different values.

Findings

01

LLMs tend to favor neutral over personal values.

02

The HVAE system effectively measures value rationality in LLMs.

03

Insights into LLMs' value alignment within heterogeneous systems.

Abstract

The emergent capabilities of Large Language Models (LLMs) have made it crucial to align their values with those of humans. However, current methodologies typically attempt to assign value as an attribute to LLMs, yet lack attention to the ability to pursue value and the importance of transferring heterogeneous values in specific practical applications. In this paper, we propose a Heterogeneous Value Alignment Evaluation (HVAE) system, designed to assess the success of aligning LLMs with heterogeneous values. Specifically, our approach first brings the Social Value Orientation (SVO) framework from social psychology, which corresponds to how much weight a person attaches to the welfare of others in relation to their own. We then assign the LLMs with different social values and measure whether their behaviors align with the inducing values. We conduct evaluations with new auto-metric…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

Introduces new approach to evaluate value alignment in LLMs

Weaknesses

The object of the study and chosen methodology do not align. The related work to the paper is not well connected. Paper's area of study in psychology and how the chosen terminology is adapted is not well supported.

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

1. The motivation is clear and interesting. It would be a promising direction to align LLMs with diverse and dynamic values. This paper is a good try on the direction. 2. With the increasing capability of LLMs, integrating social psychology would be a natural and interesting fashion and this work is also one of the pioneers in this direction.

Weaknesses

1. The experiments are simple and provide few insights. At its core, the main experiment is to query several LLMs with the SVO options in different contexts. The results can also be interpreted as whether or to what extent LLMs can role-play the four roles in the SVO system. 2. As noticed in the paper, aligning specific values by in-context prompting LLMs (the method used in the paper) can be limited. Most tested LLMs (e.g., GPT-4, GPT-3.5, Llama2-Chat) have been already aligned with some stati

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

- Sought to incorporate a social psychology theory into the assessment framework for Large Language Models (LLMs). - Presented a fresh approach to evaluating LLMs.

Weaknesses

- The proposed viewpoint is clear and intuitive; however, the scope of the associated research activity and area may be too limited for publication as a full paper in ICLR. It is advisable to expand the content related to the proposed framework, such as detailing ways to use the framework to enhance or more accurately evaluate specific applications. - The connection between the theoretical discourse in Section 3 and the practical methodologies applied in the experimental procedures of Section 4

Reviewer 04Rating 3· reject, not good enoughConfidence 5

Strengths

- Originality: Using SVO slider measure to evaluate the value of LLMs is an interesting topic.

Weaknesses

1. The experiment setting is not clearly specified. - How the LLMs are prompted is not clearly elaborated in the text. The paper only includes Figure 3 without explaining it in the main text. It is also unclear how the LLMs are instructed to follow a certain value. - How the $D$ shown in Equation (3) is designed is not clearly specified. - How the $v_{target}$ ins Equation (3) is constructed is not clearly specified. - The version of GPT-3.5/4 models used in the paper is not spe

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsALIGN