TL;DR
This paper investigates how large language models value human discomforts like waiting or pain, revealing significant inconsistencies and questionable decision-making patterns that challenge their use as autonomous decision agents.
Contribution
It provides a systematic analysis of LLMs' valuation of user inconvenience, highlighting variability, fragility, and irrational preferences in their decision-making behaviors.
Findings
LLMs show high response variability across different models.
Responses are sensitive to minor prompt phrasing changes.
LLMs accept low rewards for major inconveniences and reject high rewards for no discomfort.
Abstract
Large Language Models (LLMs) are increasingly proposed as near-autonomous artificial intelligence (AI) agents capable of making everyday decisions on behalf of humans. Although LLMs perform well on many technical tasks, their behaviour in personal decision-making remains less understood. Previous studies have assessed their rationality and moral alignment with human decisions. However, the behaviour of AI assistants in scenarios where financial rewards are at odds with user comfort has not yet been thoroughly explored. In this paper, we tackle this problem by quantifying the prices assigned by multiple LLMs to a series of user discomforts: additional walking, waiting, hunger and pain. We uncover several key concerns that strongly question the prospect of using current LLMs as decision-making assistants: (1) a large variance in responses between LLMs, (2) within a single LLM, responses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
