Leveraging knowledge graphs to update scientific word embeddings using latent semantic imputation
Jason Hoelscher-Obermaier, Edward Stevinson, Valentin Stauber, Ivaylo, Zhelev, Victor Botev, Ronin Wu, Jeremy Minton

TL;DR
This paper introduces a method using latent semantic imputation with knowledge graphs to generate reliable embeddings for rare and new scientific terms without retraining, improving biomedical word similarity tasks.
Contribution
The paper presents a novel approach leveraging knowledge graphs and latent semantic imputation to update word embeddings for rare and emerging scientific terms without retraining models.
Findings
LSI effectively imputes embeddings for rare biomedical terms
Imputed embeddings improve domain-specific word similarity performance
Method preserves original embedding quality for common terms
Abstract
The most interesting words in scientific texts will often be novel or rare. This presents a challenge for scientific word embedding models to determine quality embedding vectors for useful terms that are infrequent or newly emerging. We demonstrate how \gls{lsi} can address this problem by imputing embeddings for domain-specific words from up-to-date knowledge graphs while otherwise preserving the original word embedding model. We use the MeSH knowledge graph to impute embedding vectors for biomedical terminology without retraining and evaluate the resulting embedding model on a domain-specific word-pair similarity task. We show that LSI can produce reliable embedding vectors for rare and OOV terms in the biomedical domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
