Entity Recognition and Relation Extraction from Scientific and Technical   Texts in Russian

Elena Bruches; Alexey Pauls; Tatiana Batura; Vladimir Isachenko

arXiv:2011.09817·cs.CL·December 29, 2020

Entity Recognition and Relation Extraction from Scientific and Technical Texts in Russian

Elena Bruches, Alexey Pauls, Tatiana Batura, Vladimir Isachenko

PDF

TL;DR

This paper explores methods for extracting entities and relations from Russian scientific texts, introduces a new dataset, and compares various extraction techniques including neural networks.

Contribution

It proposes modifications of information extraction methods for Russian, and provides a new annotated corpus for scientific texts in Russian.

Findings

01

Neural network-based methods outperform keyword and vocabulary approaches.

02

The RuSERRC dataset contains 1600 documents, with 80 labeled for entities and relations.

03

Comparison results highlight the effectiveness of neural models in Russian scientific text extraction.

Abstract

This paper is devoted to the study of methods for information extraction (entity recognition and relation classification) from scientific texts on information technology. Scientific publications provide valuable information into cutting-edge scientific advances, but efficient processing of increasing amounts of data is a time-consuming task. In this paper, several modifications of methods for the Russian language are proposed. It also includes the results of experiments comparing a keyword extraction method, vocabulary method, and some methods based on neural networks. Text collections for these tasks exist for the English language and are actively used by the scientific community, but at present, such datasets in Russian are not publicly available. In this paper, we present a corpus of scientific texts in Russian, RuSERRC. This dataset consists of 1600 unlabeled documents and 80…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.