A system for information extraction from scientific texts in Russian

Elena Bruches; Anastasia Mezentseva; Tatiana Batura

arXiv:2109.06703·cs.CL·September 15, 2021

A system for information extraction from scientific texts in Russian

Elena Bruches, Anastasia Mezentseva, Tatiana Batura

PDF

Open Access

TL;DR

This paper introduces a Russian scientific text information extraction system capable of recognizing terms, extracting relations, and linking entities without extensive labeled data, aiding various NLP applications.

Contribution

The system performs multiple extraction tasks end-to-end in Russian without large labeled datasets, suitable for low-resource environments.

Findings

01

Effective term recognition and relation extraction in Russian texts.

02

No large labeled datasets required, reducing resource needs.

03

Open-source implementation available for research use.

Abstract

In this paper, we present a system for information extraction from scientific texts in the Russian language. The system performs several tasks in an end-to-end manner: term recognition, extraction of relations between terms, and term linking with entities from the knowledge base. These tasks are extremely important for information retrieval, recommendation systems, and classification. The advantage of the implemented methods is that the system does not require a large amount of labeled data, which saves time and effort for data labeling and therefore can be applied in low- and mid-resource settings. The source code is publicly available and can be used for different research purposes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Advanced Text Analysis Techniques · Natural Language Processing Techniques