SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology
Mikhail Kulyabin, Gleb Sokolov, Aleksandr Galaida, Andreas Maier,, Tomas Arias-Vergara

TL;DR
SNOBERT is a BERT-based method designed to link clinical note text spans to SNOMED CT concepts, improving automation in medical coding by leveraging large labeled datasets.
Contribution
The paper introduces SNOBERT, a novel two-stage BERT-based approach for clinical entity linking to SNOMED CT, trained on a large dataset and outperforming classical methods.
Findings
SNOBERT outperforms classical deep learning methods.
The approach was validated through a challenge with positive results.
It demonstrates effective linking of clinical text to SNOMED CT concepts.
Abstract
The extraction and analysis of insights from medical data, primarily stored in free-text formats by healthcare workers, presents significant challenges due to its unstructured nature. Medical coding, a crucial process in healthcare, remains minimally automated due to the complexity of medical ontologies and restricted access to medical texts for training Natural Language Processing models. In this paper, we proposed a method, "SNOBERT," of linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models. The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes. SNOBERT outperforms other classical methods based on deep learning, as confirmed by the results of a challenge in which it was applied.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicslinguistics and terminology studies · Biomedical Text Mining and Ontologies · Medical and Biological Sciences
