An analysis of full-size Russian complexly NER labelled corpus of Internet user reviews on the drugs based on deep learning and language neural nets
Alexander Sboev, Sanna Sboeva, Ivan Moloshnikov, Artem Gryaznov, Roman, Rybka, Alexander Naumov, Anton Selivanov, Gleb Rylkov, Viacheslav Ilyin

TL;DR
This paper introduces a comprehensive Russian NER-labeled corpus of Internet drug reviews and evaluates deep learning models for extracting pharmacological entities, establishing state-of-the-art results for Russian text analysis.
Contribution
It provides a large, annotated Russian corpus for pharmacological entity recognition and analyzes the impact of different model modifications on extraction accuracy.
Findings
Achieved 61.1 F1 score for adverse drug reaction recognition.
Baseline coreference extraction precision is 71%.
Model modifications significantly influence extraction performance.
Abstract
We present the full-size Russian complexly NER-labeled corpus of Internet user reviews, along with an evaluation of accuracy levels reached on this corpus by a set of advanced deep learning neural networks to extract the pharmacologically meaningful entities from Russian texts. The corpus annotation includes mentions of the following entities: Medication (33005 mentions), Adverse Drug Reaction (1778), Disease (17403), and Note (4490). Two of them - Medication and Disease - comprise a set of attributes. A part of the corpus has the coreference annotation with 1560 coreference chains in 300 documents. Special multi-label model based on a language model and the set of features is developed, appropriate for presented corpus labeling. The influence of the choice of different modifications of the models: word vector representations, types of language models pre-trained for Russian, text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Misinformation and Its Impacts
