Atypical lexical abbreviations identification in Russian medical texts

Anna Berdichevskaia (NUST "MISiS")

arXiv:2206.01987·cs.CL·June 7, 2022·1 cites

Atypical lexical abbreviations identification in Russian medical texts

Anna Berdichevskaia (NUST "MISiS")

PDF

Open Access 1 Repo

TL;DR

This paper presents a machine learning algorithm for identifying atypical lexical abbreviations in Russian medical texts, addressing comprehension issues for readers and providing a new dataset for this task.

Contribution

It introduces the first Russian dataset and a competitive ML-based method for detecting implicit abbreviations in medical texts.

Findings

01

ROC AUC score of 0.926

02

F1 score of 0.706

03

Effective compared to baseline methods

Abstract

Abbreviation is a method of word formation that aims to construct the shortened term from the first letters of the initial phrase. Implicit abbreviations frequently cause the comprehension difficulties for unprepared readers. In this paper, we propose an efficient ML-based algorithm which allows to identify the abbreviations in Russian texts. The method achieves ROC AUC score 0.926 and F1 score 0.706 which are confirmed as competitive in comparison with the baselines. Along with the pipeline, we also establish first to our knowledge Russian dataset that is relevant for the desired task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aberdichevskaya/abbreviation_identification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques · linguistics and terminology studies