Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer
Youmi Ma, An Wang, Naoaki Okazaki

TL;DR
This paper explores leveraging English DocRE datasets for Japanese by transfer learning, constructing a Japanese dataset, analyzing errors, and proposing a model-assisted annotation method that reduces human effort.
Contribution
It introduces a Japanese DocRE dataset created via cross-lingual transfer and demonstrates how model-assisted annotation can improve efficiency.
Findings
Transferred datasets improve initial relation predictions.
Model assistance reduces human annotation steps by 50%.
Japanese DocRE presents unique challenges for cross-lingual transfer.
Abstract
Document-level Relation Extraction (DocRE) is the task of extracting all semantic relationships from a document. While studies have been conducted on English DocRE, limited attention has been given to DocRE in non-English languages. This work delves into effectively utilizing existing English resources to promote DocRE studies in non-English languages, with Japanese as the representative case. As an initial attempt, we construct a dataset by transferring an English dataset to Japanese. However, models trained on such a dataset suffer from low recalls. We investigate the error cases and attribute the failure to different surface structures and semantics of documents translated from English and those written by native speakers. We thus switch to explore if the transferred dataset can assist human annotation on Japanese documents. In our proposal, annotators edit relation predictions from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
