Preserving Historical Truth: Detecting Historical Revisionism in Large Language Models

Francesco Ortu; Joeun Yook; Punya Syon Pandey; Keenan Samway; Bernhard Sch\"olkopf; Alberto Cazzaniga; Rada Mihalcea; Zhijing Jin

arXiv:2602.17433·cs.CY·February 24, 2026

Preserving Historical Truth: Detecting Historical Revisionism in Large Language Models

Francesco Ortu, Joeun Yook, Punya Syon Pandey, Keenan Samway, Bernhard Sch\"olkopf, Alberto Cazzaniga, Rada Mihalcea, Zhijing Jin

PDF

Open Access

TL;DR

This paper introduces a dataset and evaluation framework to detect revisionist narratives in large language models, highlighting their tendency to produce biased historical information under different prompting conditions.

Contribution

The authors present HistoricalMisinfo, a curated dataset of contested historical events, and an LLM-as-a-judge protocol to evaluate models' resistance to revisionist framing.

Findings

01

Models are closer to factual references under neutral prompts.

02

Revisionist prompts significantly increase revisionist outputs across models.

03

The framework enables benchmarking of models' robustness to historical revisionism.

Abstract

Large language models (LLMs) are increasingly used as sources of historical information, motivating the need for scalable audits on contested events and politically charged narratives in settings that mirror real user interactions. We introduce \texttt{HistoricalMisinfo, a curated dataset of $500$ contested events from $45$ countries, each paired with a factual reference narrative and a documented revisionist reference narrative. To approximate real-world usage, we instantiate each event in $11$ prompt scenarios that reflect common communication settings (e.g., questions, textbooks, social posts, policy briefs). Using an LLM-as-a-judge protocol that compares model outputs to the two references, we evaluate LLMs varying across model architectures in two conditions: (i) neutral user prompts that ask for factually accurate information, and (ii) robustness prompts in which the user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education · Topic Modeling