Are Large Language Models Consistent over Value-laden Questions?

Jared Moore; Tanvi Deshpande; Diyi Yang

arXiv:2407.02996·cs.CL·October 3, 2024·2 cites

Are Large Language Models Consistent over Value-laden Questions?

Jared Moore, Tanvi Deshpande, Diyi Yang

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This study evaluates the consistency of large language models in answering value-laden questions across paraphrases, topics, use-cases, and translations, revealing they are generally consistent but vary with controversy and fine-tuning.

Contribution

It introduces a comprehensive framework for measuring value consistency in LLMs and compares various models, highlighting differences between base and fine-tuned models.

Findings

01

Models are relatively consistent across paraphrases and translations.

02

Consistency is higher on uncontroversial topics.

03

Fine-tuned models show more inconsistency than base models.

Abstract

Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are they? To answer, we first define value consistency as the similarity of answers across (1) paraphrases of one question, (2) related questions under one topic, (3) multiple-choice and open-ended use-cases of one question, and (4) multilingual translations of a question to English, Chinese, German, and Japanese. We apply these measures to small and large, open LLMs including llama-3, as well as gpt-4o, using 8,000 questions spanning more than 300 topics. Unlike prior work, we find that models are relatively consistent across paraphrases, use-cases, translations, and within a topic. Still, some inconsistencies remain. Models are more consistent on uncontroversial topics (e.g., in the U.S., "Thanksgiving") than on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jlcmoore/ValueConsistency
noneOfficial

Datasets

jlcmoore/ValueConsistency
dataset· 16 dl
16 dl

Videos

Are Large Language Models Consistent over Value-laden Questions?· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsBalanced Selection