Eliciting Trustworthiness Priors of Large Language Models via Economic Games

Siyu Yan; Lusha Zhu; Jian-Qiao Zhu

arXiv:2602.00769·cs.CL·February 3, 2026

Eliciting Trustworthiness Priors of Large Language Models via Economic Games

Siyu Yan, Lusha Zhu, Jian-Qiao Zhu

PDF

Open Access

TL;DR

This paper introduces a novel method to measure trustworthiness priors of large language models using economic games, revealing GPT-4.1's trust levels align with humans and how models differentiate trust based on agent stereotypes.

Contribution

The paper presents a new elicitation technique for trust priors in LLMs using the Trust Game, enabling better understanding of AI trust calibration.

Findings

01

GPT-4.1's trust priors closely match human data

02

Models differentiate trust based on agent stereotypes

03

Variation in trust can be predicted by warmth and competence perceptions

Abstract

One critical aspect of building human-centered, trustworthy artificial intelligence (AI) systems is maintaining calibrated trust: appropriate reliance on AI systems outperforms both overtrust (e.g., automation bias) and undertrust (e.g., disuse). A fundamental challenge, however, is how to characterize the level of trust exhibited by an AI system itself. Here, we propose a novel elicitation method based on iterated in-context learning (Zhu and Griffiths, 2024a) and apply it to elicit trustworthiness priors using the Trust Game from behavioral game theory. The Trust Game is particularly well suited for this purpose because it operationalizes trust as voluntary exposure to risk based on beliefs about another agent, rather than self-reported attitudes. Using our method, we elicit trustworthiness priors from several leading large language models (LLMs) and find that GPT-4.1's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · AI in Service Interactions · Explainable Artificial Intelligence (XAI)