Evaluating Biomedical BERT Models for Vocabulary Alignment at Scale in the UMLS Metathesaurus
Goonmeet Bajaj, Vinh Nguyen, Thilini Wijesiriwardene, Hong Yung Yip,, Vishesh Javangula, Srinivasan Parthasarathy, Amit Sheth, Olivier Bodenreider

TL;DR
This study evaluates whether biomedical BERT models improve vocabulary alignment in the UMLS Metathesaurus and finds that traditional BioWordVec embeddings outperform BERT-based approaches for synonymy prediction.
Contribution
The paper systematically compares biomedical BERT models with existing methods for UMLS synonymy prediction, revealing BERT's limitations in this specific task.
Findings
BERT-based models do not outperform BioWordVec with Siamese Networks.
Original BioBERT outperforms SapBERT models pre-trained with UMLS.
Siamese Networks yield better results than BERT models for synonym prediction.
Abstract
The current UMLS (Unified Medical Language System) Metathesaurus construction process for integrating over 200 biomedical source vocabularies is expensive and error-prone as it relies on the lexical algorithms and human editors for deciding if the two biomedical terms are synonymous. Recent advances in Natural Language Processing such as Transformer models like BERT and its biomedical variants with contextualized word embeddings have achieved state-of-the-art (SOTA) performance on downstream tasks. We aim to validate if these approaches using the BERT models can actually outperform the existing approaches for predicting synonymy in the UMLS Metathesaurus. In the existing Siamese Networks with LSTM and BioWordVec embeddings, we replace the BioWordVec embeddings with the biomedical BERT embeddings extracted from each BERT model using different ways of extraction. In the Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Weight Decay · Linear Warmup With Linear Decay · Dropout · Sigmoid Activation · Layer Normalization
