TERMinator: A system for scientific texts processing
Elena Bruches, Olga Tikhobaeva, Yana Dementyeva, Tatiana Batura

TL;DR
This paper introduces TERMinator, a system for extracting entities and semantic relations from scientific texts, along with a new annotated dataset, highlighting the impact of language models and heuristics on extraction performance.
Contribution
The paper presents a novel system and dataset for scientific text processing, and compares the effectiveness of language models and heuristics in relation extraction.
Findings
Language models pre-trained on the target language do not always outperform others.
Heuristic approaches can enhance relation extraction quality.
The system and dataset are publicly available for research use.
Abstract
This paper is devoted to the extraction of entities and semantic relations between them from scientific texts, where we consider scientific terms as entities. In this paper, we present a dataset that includes annotations for two tasks and develop a system called TERMinator for the study of the influence of language models on term recognition and comparison of different approaches for relation extraction. Experiments show that language models pre-trained on the target language are not always show the best performance. Also adding some heuristic approaches may improve the overall quality of the particular task. The developed tool and the annotated corpus are publicly available at https://github.com/iis-research-team/terminator and may be useful for other researchers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
