RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining
Alexander Yalunin, Alexander Nesterov, and Dmitriy Umerenkov

TL;DR
This paper introduces RuBioRoBERTa, a pre-trained Russian biomedical language model that achieves state-of-the-art performance across various biomedical NLP tasks in Russian.
Contribution
The paper develops and pre-trains RuBioRoBERTa on Russian biomedical texts, demonstrating superior performance on multiple biomedical NLP benchmarks.
Findings
State-of-the-art results on RuMedBench
Effective for text classification and NER
Improves Russian biomedical text understanding
Abstract
This paper presents several BERT-based models for Russian language biomedical text mining (RuBioBERT, RuBioRoBERTa). The models are pre-trained on a corpus of freely available texts in the Russian biomedical domain. With this pre-training, our models demonstrate state-of-the-art results on RuMedBench - Russian medical language understanding benchmark that covers a diverse set of tasks, including text classification, question answering, natural language inference, and named entity recognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
