Entity Recognition and Relation Extraction from Scientific and Technical Texts in Russian
Elena Bruches, Alexey Pauls, Tatiana Batura, Vladimir Isachenko

TL;DR
This paper explores methods for extracting entities and relations from Russian scientific texts, introduces a new dataset, and compares various extraction techniques including neural networks.
Contribution
It proposes modifications of information extraction methods for Russian, and provides a new annotated corpus for scientific texts in Russian.
Findings
Neural network-based methods outperform keyword and vocabulary approaches.
The RuSERRC dataset contains 1600 documents, with 80 labeled for entities and relations.
Comparison results highlight the effectiveness of neural models in Russian scientific text extraction.
Abstract
This paper is devoted to the study of methods for information extraction (entity recognition and relation classification) from scientific texts on information technology. Scientific publications provide valuable information into cutting-edge scientific advances, but efficient processing of increasing amounts of data is a time-consuming task. In this paper, several modifications of methods for the Russian language are proposed. It also includes the results of experiments comparing a keyword extraction method, vocabulary method, and some methods based on neural networks. Text collections for these tasks exist for the English language and are actively used by the scientific community, but at present, such datasets in Russian are not publicly available. In this paper, we present a corpus of scientific texts in Russian, RuSERRC. This dataset consists of 1600 unlabeled documents and 80…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
