Towards the AI Historian: Agentic Information Extraction from Primary Sources
Lorenz Hufe, Niclas Griesshaber, Gavin Greif, Sebastian Oliver Eck, Philip Torr

TL;DR
This paper introduces the first module of Chronos, an AI Historian that enables historians to extract data from primary sources via natural language, allowing adaptable workflows and iterative refinement.
Contribution
It presents an open-source module that facilitates flexible, natural-language-based information extraction from primary sources for historical research.
Findings
The module supports heterogeneous source corpora.
Historians can evaluate AI model performance on specific tasks.
The system allows iterative workflow refinement through natural language.
Abstract
AI is supporting, accelerating, and automating scientific discovery across a diverse set of fields. However, AI adoption in historical research remains limited due to the lack of solutions designed for historians. In this technical progress report, we introduce the first module of Chronos, an AI Historian under development. This module enables historians to convert image scans of primary sources into data through natural-language interactions. Rather than imposing a fixed extraction pipeline powered by a vision-language model (VLM), it allows historians to adapt workflows for heterogeneous source corpora, evaluate the performance of AI models on specific tasks, and iteratively refine workflows through natural-language interaction with the Chronos agent. The module is open-source and ready to be used by historical researchers on their own sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
