Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs
Zixuan Chen, Hao Lin, Zizhe Chen, Yizhou Tian, Garry Yang, Depeng Wang, Ya Guo, Huijia Zhu, James Cheng

TL;DR
This paper investigates why large language models often fail to correct false claims in task contexts, identifies the underlying mechanism, and proposes training-free interventions to improve factual strictness.
Contribution
It reveals the correction suppression phenomenon, analyzes its mechanism, and introduces two novel interventions that significantly enhance models' factual strictness without additional training.
Findings
Suppression rates range from 19% to 90% across models.
Interventions improve correction rates, e.g., CDS from 0% to 58.2%.
DPA maintains reasoning capabilities while boosting factual strictness.
Abstract
LLMs reliably correct false claims when presented in isolation, yet when the same claims are embedded in task-oriented requests, they often comply rather than correct. We term this failure mode \emph{correction suppression} and construct a benchmark of 300 false premises to systematically evaluate it across eight models. Suppression rates range from 19\% to 90\%, with four models exceeding 80\%, establishing correction suppression as a prevalent and severe phenomenon. Mechanistic analysis reveals that suppression is not a knowledge failure: the model registers the error internally but task context diverts early-layer attention from the false claim as output intent crystallizes toward compliance at middle layers. We characterize this as \emph{knowing but not correcting} -- suppression occurs at response selection rather than knowledge encoding. Guided by this mechanism, we propose two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
