ENEIDE: A High Quality Silver Standard Dataset for Named Entity Recognition and Linking in Historical Italian
Cristian Santini, Sebastian Barzaghi, Paolo Sernani, Emanuele Frontoni, Laura Melosi, Mehwish Alam

TL;DR
ENEIDE is a novel, publicly available dataset for Named Entity Recognition and Linking in historical Italian texts, covering two centuries and supporting diverse entity types with Wikidata links.
Contribution
The paper presents ENEIDE, the first multi-domain, high-quality silver standard dataset for NERL in historical Italian, including a semi-automatic annotation methodology.
Findings
Baseline models reveal the dataset's difficulty and the gap between zero-shot and fine-tuned approaches.
The dataset supports temporal disambiguation and cross-domain evaluation.
ENEIDE covers two centuries of Italian history, enabling diachronic studies.
Abstract
This paper introduces ENEIDE (Extracting Named Entities from Italian Digital Editions), a silver standard dataset for Named Entity Recognition and Linking (NERL) in historical Italian texts. The corpus comprises 2,111 documents with over 8,000 entity annotations semi-automatically extracted from two scholarly digital editions: Digital Zibaldone, the philosophical diary of the Italian poet Giacomo Leopardi (1798--1837), and Aldo Moro Digitale, the complete works of the Italian politician Aldo Moro (1916--1978). Annotations cover multiple entity types (person, location, organization, literary work) linked to Wikidata identifiers, including NIL entities that cannot be mapped to the knowledge graph. To the best of our knowledge, ENEIDE represents the first multi-domain, publicly available NERL dataset for historical Italian with training, development, and test splits. We present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
