HistRED: A Historical Document-Level Relation Extraction Dataset
Soyoung Yang, Minseok Choi, Youngwoo Cho, Jaegul Choo

TL;DR
HistRED is a new bilingual dataset for historical document-level relation extraction, enabling research on Korean and Hanja texts with diverse context lengths, and demonstrating improved RE performance using multi-language information.
Contribution
We introduce HistRED, a novel dataset for historical relation extraction with bilingual annotations and variable text lengths, and propose a model leveraging both languages for better accuracy.
Findings
Our bilingual model outperforms monolingual baselines.
HistRED supports diverse context lengths for robust evaluation.
The dataset is publicly available for research use.
Abstract
Despite the extensive applications of relation extraction (RE) tasks in various domains, little has been explored in the historical context, which contains promising data across hundreds and thousands of years. To promote the historical RE research, we present HistRED constructed from Yeonhaengnok. Yeonhaengnok is a collection of records originally written in Hanja, the classical Chinese writing, which has later been translated into Korean. HistRED provides bilingual annotations such that RE can be performed on Korean and Hanja texts. In addition, HistRED supports various self-contained subtexts with different lengths, from a sentence level to a document level, supporting diverse context settings for researchers to evaluate the robustness of their RE models. To demonstrate the usefulness of our dataset, we propose a bilingual RE model that leverages both Korean and Hanja contexts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
