Biographical: A Semi-Supervised Relation Extraction Dataset
Alistair Plum, Tharindu Ranasinghe, Spencer Jones, Constantin Orasan,, Ruslan Mitkov

TL;DR
This paper introduces Biographical, a semi-supervised relation extraction dataset created by aligning Wikipedia sentences with structured data, enabling improved training of neural models for biographical information extraction in digital humanities.
Contribution
The paper presents the first semi-supervised RE dataset for biographical info, automatically compiled using Wikipedia structure and NER, addressing annotation cost issues.
Findings
Neural model trained on Biographical achieves high accuracy.
Dataset effectively matches relations with high precision.
Demonstrates usefulness for digital humanities and history research.
Abstract
Extracting biographical information from online documents is a popular research topic among the information extraction (IE) community. Various natural language processing (NLP) techniques such as text classification, text summarisation and relation extraction are commonly used to achieve this. Among these techniques, RE is the most common since it can be directly used to build biographical knowledge graphs. RE is usually framed as a supervised machine learning (ML) problem, where ML models are trained on annotated datasets. However, there are few annotated datasets for RE since the annotation process can be costly and time-consuming. To address this, we developed Biographical, the first semi-supervised dataset for RE. The dataset, which is aimed towards digital humanities (DH) and historical research, is automatically compiled by aligning sentences from Wikipedia articles with matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
