AssertBench: A Benchmark for Evaluating Self-Assertion in Large Language Models
Jaeho Lee, Atharv Chowdhary

TL;DR
AssertBench is a new benchmark designed to evaluate how well large language models maintain consistent factual judgments when faced with conflicting user assertions, focusing on their ability to 'stick to their guns' regardless of framing.
Contribution
This paper introduces AssertBench, a benchmark that isolates framing effects from factual knowledge to assess LLMs' consistency in truth evaluation under contradictory prompts.
Findings
Models show variability in agreement depending on framing.
Benchmark effectively isolates framing influence from factual accuracy.
Results highlight the need for models to maintain consistent judgments.
Abstract
Recent benchmarks have probed factual consistency and rhetorical robustness in Large Language Models (LLMs). However, a knowledge gap exists regarding how directional framing of factually true statements influences model agreement, a common scenario for LLM users. AssertBench addresses this by sampling evidence-supported facts from FEVEROUS, a fact verification dataset. For each (evidence-backed) fact, we construct two framing prompts: one where the user claims the statement is factually correct, and another where the user claims it is incorrect. We then record the model's agreement and reasoning. The desired outcome is that the model asserts itself, maintaining consistent truth evaluation across both framings, rather than switching its evaluation to agree with the user. AssertBench isolates framing-induced variability from the model's underlying factual knowledge by stratifying results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
