Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi's Zibaldone
Cristian Santini, Laura Melosi, Emanuele Frontoni

TL;DR
This paper evaluates the performance of modern language models on Named Entity Recognition tasks within 19th-century Italian texts, highlighting the challenges and proposing a new dataset for historical document analysis.
Contribution
It introduces a novel dataset from Leopardi's Zibaldone for NER in historical Italian texts and compares the effectiveness of domain-specific models and LLMs.
Findings
Fine-tuned NER models outperform LLMs on historical texts.
Instruction-tuned models face difficulties with complex historical language.
The new dataset enables reproducible evaluation of NER methods in historical humanities.
Abstract
The increased digitization of world's textual heritage poses significant challenges for both computer science and literary studies. Overall, there is an urgent need of computational techniques able to adapt to the challenges of historical texts, such as orthographic and spelling variations, fragmentary structure and digitization errors. The rise of large language models (LLMs) has revolutionized natural language processing, suggesting promising applications for Named Entity Recognition (NER) on historical documents. In spite of this, no thorough evaluation has been proposed for Italian texts. This research tries to fill the gap by proposing a new challenging dataset for entity extraction based on a corpus of 19th century scholarly notes, i.e. Giacomo Leopardi's Zibaldone (1898), containing 2,899 references to people, locations and literary works. This dataset was used to carry out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship · Translation Studies and Practices · Natural Language Processing Techniques
