Preserving Historical Truth: Detecting Historical Revisionism in Large Language Models
Francesco Ortu, Joeun Yook, Punya Syon Pandey, Keenan Samway, Bernhard Sch\"olkopf, Alberto Cazzaniga, Rada Mihalcea, Zhijing Jin

TL;DR
This paper introduces a dataset and evaluation framework to detect revisionist narratives in large language models, highlighting their tendency to produce biased historical information under different prompting conditions.
Contribution
The authors present HistoricalMisinfo, a curated dataset of contested historical events, and an LLM-as-a-judge protocol to evaluate models' resistance to revisionist framing.
Findings
Models are closer to factual references under neutral prompts.
Revisionist prompts significantly increase revisionist outputs across models.
The framework enables benchmarking of models' robustness to historical revisionism.
Abstract
Large language models (LLMs) are increasingly used as sources of historical information, motivating the need for scalable audits on contested events and politically charged narratives in settings that mirror real user interactions. We introduce \texttt{HistoricalMisinfo, a curated dataset of contested events from countries, each paired with a factual reference narrative and a documented revisionist reference narrative. To approximate real-world usage, we instantiate each event in prompt scenarios that reflect common communication settings (e.g., questions, textbooks, social posts, policy briefs). Using an LLM-as-a-judge protocol that compares model outputs to the two references, we evaluate LLMs varying across model architectures in two conditions: (i) neutral user prompts that ask for factually accurate information, and (ii) robustness prompts in which the user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education · Topic Modeling
