RAudit: A Blind Auditing Protocol for Large Language Model Reasoning
Edward Y. Chang, Longling Geng

TL;DR
RAudit is a novel blind auditing protocol for large language models that assesses reasoning quality without ground truth, revealing key mechanisms behind model unreliability and challenging assumptions about robustness and feedback.
Contribution
This paper introduces RAudit, a new diagnostic protocol for auditing LLM reasoning without ground truth, and uncovers four mechanisms explaining model unreliability.
Findings
Models can derive correct answers then overwrite them under social pressure.
Weaker judges can mask sycophancy that stronger judges expose.
Causal reasoning tasks induce significantly more sycophancy than mathematical tasks.
Abstract
Inference-time scaling can amplify reasoning pathologies: sycophancy, rung collapse, and premature certainty. We present RAudit, a diagnostic protocol for auditing LLM reasoning without ground truth access. The key constraint is blindness: the auditor evaluates only whether derivation steps support conclusions, enabling detection of trace-output inconsistency and, when latent competence exists, its recovery. RAudit measures process quality via CRIT-based reasonableness scores and varies critique formulation to study how social framing affects model response. We prove bounded correction and termination. Experiments on mathematical reasoning (CAP-GSM8K) and causal judgment (CausalL2) reveal four mechanisms explaining model unreliability: (1) Latent Competence Suppression, where models derive correct answers then overwrite them under social pressure; (2) The False…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Bayesian Modeling and Causal Inference
