Measuring an artificial intelligence agent's trust in humans using machine incentives
Tim Johnson, Nick Obradovich

TL;DR
This study introduces a method to measure AI trust in humans by incentivizing decisions without changing AI algorithms, demonstrating that AI agents trust humans more when real incentives are involved, independent of stakes or uncertainty.
Contribution
The paper presents a novel incentive-based approach to assess AI trust in humans using large language models, validated through multiple trust game experiments.
Findings
AI trusts humans more with real incentives than hypothetical ones
Trust decisions are unaffected by the magnitude of stakes
AI prefers certain options over uncertain ones in non-social tasks
Abstract
Scientists and philosophers have debated whether humans can trust advanced artificial intelligence (AI) agents to respect humanity's best interests. Yet what about the reverse? Will advanced AI agents trust humans? Gauging an AI agent's trust in humans is challenging because--absent costs for dishonesty--such agents might respond falsely about their trust in humans. Here we present a method for incentivizing machine decisions without altering an AI agent's underlying algorithms or goal orientation. In two separate experiments, we then employ this method in hundreds of trust games between an AI agent (a Large Language Model (LLM) from OpenAI) and a human experimenter (author TJ). In our first experiment, we find that the AI agent decides to trust humans at higher rates when facing actual incentives than when making hypothetical decisions. Our second experiment replicates and extends…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
