Certainty robustness: Evaluating LLM stability under self-challenging prompts
Mohammadreza Saadat, Steve Nemzer

TL;DR
This paper introduces a new benchmark to evaluate how large language models respond to self-challenging prompts, revealing differences in their stability and trustworthiness during interactive questioning.
Contribution
The paper proposes the Certainty Robustness Benchmark, a two-turn evaluation framework for assessing LLM stability and adaptability under challenging prompts, highlighting a new dimension of model evaluation.
Findings
Some models abandon correct answers under challenge.
Other models resist challenge and align confidence with correctness.
Substantial differences in interactive reliability among models.
Abstract
Large language models (LLMs) often present answers with high apparent confidence despite lacking an explicit mechanism for reasoning about certainty or truth. While existing benchmarks primarily evaluate single-turn accuracy, truthfulness or confidence calibration, they do not capture how models behave when their responses are challenged in interactive settings. We introduce the Certainty Robustness Benchmark, a two-turn evaluation framework that measures how LLMs balance stability and adaptability under self-challenging prompts such as uncertainty ("Are you sure?") and explicit contradiction ("You are wrong!"), alongside numeric confidence elicitation. Using 200 reasoning and mathematics questions from LiveBench, we evaluate four state-of-the-art LLMs and distinguish between justified self-corrections and unjustified answer changes. Our results reveal substantial differences in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification
