Building a Japanese Document-Level Relation Extraction Dataset Assisted   by Cross-Lingual Transfer

Youmi Ma; An Wang; Naoaki Okazaki

arXiv:2404.16506·cs.CL·April 26, 2024

Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer

Youmi Ma, An Wang, Naoaki Okazaki

PDF

Open Access

TL;DR

This paper explores leveraging English DocRE datasets for Japanese by transfer learning, constructing a Japanese dataset, analyzing errors, and proposing a model-assisted annotation method that reduces human effort.

Contribution

It introduces a Japanese DocRE dataset created via cross-lingual transfer and demonstrates how model-assisted annotation can improve efficiency.

Findings

01

Transferred datasets improve initial relation predictions.

02

Model assistance reduces human annotation steps by 50%.

03

Japanese DocRE presents unique challenges for cross-lingual transfer.

Abstract

Document-level Relation Extraction (DocRE) is the task of extracting all semantic relationships from a document. While studies have been conducted on English DocRE, limited attention has been given to DocRE in non-English languages. This work delves into effectively utilizing existing English resources to promote DocRE studies in non-English languages, with Japanese as the representative case. As an initial attempt, we construct a dataset by transferring an English dataset to Japanese. However, models trained on such a dataset suffer from low recalls. We investigate the error cases and attribute the failure to different surface structures and semantics of documents translated from English and those written by native speakers. We thus switch to explore if the transferred dataset can assist human annotation on Japanese documents. In our proposal, annotators edit relation predictions from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling