DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction
Abhyuday Bhartiya, Kartikeya Badola, Mausam

TL;DR
This paper introduces DiS-ReX, a large multilingual dataset for distantly supervised relation extraction that addresses limitations of previous datasets, enabling more realistic and diverse multilingual relation extraction research.
Contribution
We present DiS-ReX, a new challenging multilingual dataset with over 1.5 million sentences across 4 languages, improving over existing datasets by reducing unrealistic characteristics.
Findings
DiS-ReX is more challenging than previous datasets.
Benchmark results show room for future research in multilingual DS-RE.
Our dataset enables more realistic evaluation of relation extraction models.
Abstract
Distant supervision (DS) is a well established technique for creating large-scale datasets for relation extraction (RE) without using human annotations. However, research in DS-RE has been mostly limited to the English language. Constraining RE to a single language inhibits utilization of large amounts of data in other languages which could allow extraction of more diverse facts. Very recently, a dataset for multilingual DS-RE has been released. However, our analysis reveals that the proposed dataset exhibits unrealistic characteristics such as 1) lack of sentences that do not express any relation, and 2) all sentences for a given entity pair expressing exactly one relation. We show that these characteristics lead to a gross overestimation of the model performance. In response, we propose a new dataset, DiS-ReX, which alleviates these issues. Our dataset has more than 1.5 million…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsmBERT
