Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation   Extraction

Elisa Bassignana; Filip Ginter; Sampo Pyysalo; Rob van der Goot; and; Barbara Plank

arXiv:2305.10985·cs.CL·May 19, 2023·1 cites

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, and, Barbara Plank

PDF

Open Access 1 Repo

TL;DR

This paper introduces Multi-CrossRE, the largest multi-lingual relation extraction dataset covering 26 languages and six domains, enabling research beyond English and demonstrating high-quality translations through baseline experiments.

Contribution

It presents Multi-CrossRE, a novel multi-lingual, multi-domain relation extraction dataset based on machine translation of CrossRE, filling a major resource gap in non-English RE research.

Findings

01

Baseline results are consistent across original and back-translated data.

02

High translation quality confirmed by native speaker checks.

03

Dataset covers 26 languages and 6 domains, broadening RE research scope.

Abstract

Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and--as sanity check--over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mainlp/crossre
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems