If I Could Turn Back Time: Temporal Reframing as a Historical Reasoning Task for LLMs
Lars Bungum, Charles Yijia Huang, Abeer Kashar

TL;DR
This paper explores the capability of large language models to perform temporal reasoning by answering historical trivia questions as if it were 1940, revealing insights into language and size effects on reasoning.
Contribution
It introduces a novel temporal reframing task for LLMs, comparing performance across languages and model sizes, and evaluates multiple LLM architectures on historical reasoning.
Findings
English prompting outperforms Norwegian prompting unexpectedly.
Larger LLMs yield better temporal reasoning results.
Model performance varies significantly across architectures.
Abstract
In this study, we experiment with the ability of LLMs to do temporal reasoning. Using a Norwegian book from 1940 containing trivia questions, we prompt the LLMs to answer the questions as if it were 1940. We also pose the questions in both English and Norwegian. Correct answers are often presented as sentences, and grading is done by means of LLM-as-judge, with sampled checks by a native speaker. Prompting in English consistently gave better results than in Norwegian, an unexpected result. In contrast, using larger LLMs improved results. We tested the DeepSeek-R1, Gemma3, Qwen3, and Llama3.1 model families, and also the largest available LLM especially crafted for Norwegian.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
