Beyond Human Norms: Unveiling Unique Values of Large Language Models   through Interdisciplinary Approaches

Pablo Biedma; Xiaoyuan Yi; Linus Huang; Maosong Sun; Xing Xie

arXiv:2404.12744·cs.CL·May 13, 2024

Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches

Pablo Biedma, Xiaoyuan Yi, Linus Huang, Maosong Sun, Xing Xie

PDF

Open Access

TL;DR

This paper introduces ValueLex, a novel framework that uncovers the unique, structured value system of large language models using interdisciplinary psychological methods, revealing core dimensions beyond human norms.

Contribution

The work pioneers a method to reconstruct LLMs' values from scratch, identifying a structured value system with core dimensions, distinct from human values, through a generative and analytical approach.

Findings

01

Identified three core value dimensions: Competence, Character, and Integrity.

02

Developed tailored projective tests for evaluating LLMs' value inclinations.

03

Revealed that LLMs possess a structured, non-human value system.

Abstract

Recent advancements in Large Language Models (LLMs) have revolutionized the AI field but also pose potential safety and ethical risks. Deciphering LLMs' embedded values becomes crucial for assessing and mitigating their risks. Despite extensive investigation into LLMs' values, previous studies heavily rely on human-oriented value systems in social sciences. Then, a natural question arises: Do LLMs possess unique values beyond those of humans? Delving into it, this work proposes a novel framework, ValueLex, to reconstruct LLMs' unique value system from scratch, leveraging psychological methodologies from human personality/value research. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs, synthesizing a taxonomy that culminates in a comprehensive value framework via factor analysis and semantic clustering. We identify three core…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques