RuBioRoBERTa: a pre-trained biomedical language model for Russian   language biomedical text mining

Alexander Yalunin; Alexander Nesterov; and Dmitriy Umerenkov

arXiv:2204.03951·cs.CL·April 11, 2022·5 cites

RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining

Alexander Yalunin, Alexander Nesterov, and Dmitriy Umerenkov

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces RuBioRoBERTa, a pre-trained Russian biomedical language model that achieves state-of-the-art performance across various biomedical NLP tasks in Russian.

Contribution

The paper develops and pre-trains RuBioRoBERTa on Russian biomedical texts, demonstrating superior performance on multiple biomedical NLP benchmarks.

Findings

01

State-of-the-art results on RuMedBench

02

Effective for text classification and NER

03

Improves Russian biomedical text understanding

Abstract

This paper presents several BERT-based models for Russian language biomedical text mining (RuBioBERT, RuBioRoBERTa). The models are pre-trained on a corpus of freely available texts in the Russian biomedical domain. With this pre-training, our models demonstrate state-of-the-art results on RuMedBench - Russian medical language understanding benchmark that covers a diverse set of tasks, including text classification, question answering, natural language inference, and named entity recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pavel-blinov/RuMedBench
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques