RuREBus: a Case Study of Joint Named Entity Recognition and Relation   Extraction from e-Government Domain

Vitaly Ivanin; Ekaterina Artemova; Tatiana Batura; Vladimir; Ivanov; Veronika Sarkisyan; Elena Tutubalina; Ivan Smurov

arXiv:2010.15939·cs.CL·November 2, 2020

RuREBus: a Case Study of Joint Named Entity Recognition and Relation Extraction from e-Government Domain

Vitaly Ivanin, Ekaterina Artemova, Tatiana Batura, Vladimir, Ivanov, Veronika Sarkisyan, Elena Tutubalina, Ivan Smurov

PDF

TL;DR

This paper explores the challenges of applying state-of-the-art NER and relation extraction models to a novel, non-English e-government corpus with unique annotation schemes, highlighting the need for more advanced methods.

Contribution

The study presents a comprehensive pipeline for joint NER and RE in a specialized domain, revealing limitations of current transformer-based models and emphasizing the necessity for improved techniques.

Findings

01

Transformer models show modest performance on the domain-specific corpus.

02

Fine-tuning on unlabeled data does not significantly improve results.

03

Current NER and RE technologies are insufficient for complex, non-English government documents.

Abstract

We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.