Multilingual Event Extraction from Historical Newspaper Adverts
Nadav Borenstein, Natalia da Silva Perez, Isabelle Augenstein

TL;DR
This paper introduces a multilingual dataset for event extraction from historical newspaper ads, demonstrating that extractive QA models and machine translation can effectively address low-resource challenges in historical NLP tasks.
Contribution
It presents a new multilingual dataset for historical event extraction and explores effective methods like extractive QA and machine translation for low-resource historical languages.
Findings
Extractive QA models perform well with scarce data.
Machine translation often outperforms low-resource learning methods.
Cross-lingual transfer remains highly challenging.
Abstract
NLP methods can aid historians in analyzing textual materials in greater volumes than manually feasible. Developing such methods poses substantial challenges though. First, acquiring large, annotated historical datasets is difficult, as only domain experts can reliably label them. Second, most available off-the-shelf NLP models are trained on modern language texts, rendering them significantly less effective when applied to historical corpora. This is particularly problematic for less well studied tasks, and for languages other than English. This paper addresses these challenges while focusing on the under-explored task of event extraction from a novel domain of historical texts. We introduce a new multilingual dataset in English, French, and Dutch composed of newspaper ads from the early modern colonial period reporting on enslaved people who liberated themselves from enslavement. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods
