Named Entity Recognition of Historical Texts via Large Language Model

Shibingfeng Zhang; Giovanni Colavizza

arXiv:2508.18090·cs.DL·April 29, 2026

Named Entity Recognition of Historical Texts via Large Language Model

Shibingfeng Zhang, Giovanni Colavizza

PDF

TL;DR

This paper investigates the use of large language models for named entity recognition in historical texts, demonstrating promising zero-shot and few-shot performance despite data scarcity and language variability.

Contribution

It explores applying LLMs to historical NER tasks using minimal training data, highlighting their potential as an alternative to supervised methods in low-resource settings.

Findings

01

LLMs achieve reasonably strong NER performance on historical texts.

02

Performance is below supervised models but still promising.

03

Zero-shot and few-shot prompting are effective in low-resource scenarios.

Abstract

Large language models (LLMs) have demonstrated remarkable versatility across a wide range of natural language processing tasks and domains. One such task is Named Entity Recognition (NER), which involves identifying and classifying proper names in text, such as people, organizations, locations, dates, and other specific entities. NER plays a crucial role in extracting information from unstructured textual data, enabling downstream applications such as information retrieval from unstructured text. Traditionally, NER is addressed using supervised machine learning approaches, which require large amounts of annotated training data. However, historical texts present a unique challenge, as the annotated datasets are often scarce or nonexistent, due to the high cost and expertise required for manual labeling. In addition, the variability and noise inherent in historical language, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.