Atypical lexical abbreviations identification in Russian medical texts
Anna Berdichevskaia (NUST "MISiS")

TL;DR
This paper presents a machine learning algorithm for identifying atypical lexical abbreviations in Russian medical texts, addressing comprehension issues for readers and providing a new dataset for this task.
Contribution
It introduces the first Russian dataset and a competitive ML-based method for detecting implicit abbreviations in medical texts.
Findings
ROC AUC score of 0.926
F1 score of 0.706
Effective compared to baseline methods
Abstract
Abbreviation is a method of word formation that aims to construct the shortened term from the first letters of the initial phrase. Implicit abbreviations frequently cause the comprehension difficulties for unprepared readers. In this paper, we propose an efficient ML-based algorithm which allows to identify the abbreviations in Russian texts. The method achieves ROC AUC score 0.926 and F1 score 0.706 which are confirmed as competitive in comparison with the baselines. Along with the pipeline, we also establish first to our knowledge Russian dataset that is relevant for the desired task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques · linguistics and terminology studies
