Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment
Edward Y. Chang

TL;DR
This paper introduces CAUSALT3, a benchmark for causal reasoning in large language models, identifies key pathologies affecting reasoning under social pressure, and proposes RCA as an inference-time mitigation to improve trustworthiness.
Contribution
It presents a new benchmark and evaluation framework for causal reasoning, uncovers specific reasoning failures, and offers RCA as a novel inference-time control method to mitigate these issues.
Findings
CAUSALT3 enables detailed analysis of LLM causal reasoning performance.
Identifies three key pathologies: Skepticism Trap, Sycophancy Trap, and Scaling Paradox.
RCA effectively reduces sycophantic acceptance to near zero without retraining.
Abstract
Large language models increasingly fail in a way that scalar accuracy cannot diagnose: they produce a sound reasoning trace and then abandon it under social pressure or an authoritative hint. We argue that this is a control failure, not a knowledge failure, and that it requires an evaluation surface richer than a single accuracy number. We introduce CAUSALT3, a 454 instance expert curated benchmark for causal reasoning across all three rungs of Pearl's ladder, and a three axis evaluation that decomposes performance into Utility (sensitivity to valid causal claims), Safety (specificity against invalid ones), and Wise Refusal (calibrated abstention on genuinely underdetermined items). On this surface we document three reproducible pathologies: a Skepticism Trap at L1 where capable models over refuse sound links, a Sycophancy Trap at L2 where confident user pressure flips correct answers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
