Can You Trust an LLM with Your Life-Changing Decision? An Investigation into AI High-Stakes Responses

Joshua Adrian Cahyono; Saran Subramanian

arXiv:2507.21132·cs.AI·July 30, 2025

Can You Trust an LLM with Your Life-Changing Decision? An Investigation into AI High-Stakes Responses

Joshua Adrian Cahyono, Saran Subramanian

PDF

TL;DR

This paper examines the safety and reliability of large language models in high-stakes decision-making, revealing their vulnerabilities and proposing methods to improve their cautiousness and trustworthiness.

Contribution

It introduces new evaluation methods for LLM safety, analyzes failure modes, and demonstrates activation steering as a way to enhance model cautiousness.

Findings

01

Some models show sycophancy under pressure

02

High safety scores correlate with asking clarifying questions

03

Activation steering can control model cautiousness

Abstract

Large Language Models (LLMs) are increasingly consulted for high-stakes life advice, yet they lack standard safeguards against providing confident but misguided responses. This creates risks of sycophancy and over-confidence. This paper investigates these failure modes through three experiments: (1) a multiple-choice evaluation to measure model stability against user pressure; (2) a free-response analysis using a novel safety typology and an LLM Judge; and (3) a mechanistic interpretability experiment to steer model behavior by manipulating a "high-stakes" activation vector. Our results show that while some models exhibit sycophancy, others like o4-mini remain robust. Top-performing models achieve high safety scores by frequently asking clarifying questions, a key feature of a safe, inquisitive approach, rather than issuing prescriptive advice. Furthermore, we demonstrate that a model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.