Leveraging knowledge graphs to update scientific word embeddings using   latent semantic imputation

Jason Hoelscher-Obermaier; Edward Stevinson; Valentin Stauber; Ivaylo; Zhelev; Victor Botev; Ronin Wu; Jeremy Minton

arXiv:2210.15358·cs.CL·October 28, 2022

Leveraging knowledge graphs to update scientific word embeddings using latent semantic imputation

Jason Hoelscher-Obermaier, Edward Stevinson, Valentin Stauber, Ivaylo, Zhelev, Victor Botev, Ronin Wu, Jeremy Minton

PDF

Open Access

TL;DR

This paper introduces a method using latent semantic imputation with knowledge graphs to generate reliable embeddings for rare and new scientific terms without retraining, improving biomedical word similarity tasks.

Contribution

The paper presents a novel approach leveraging knowledge graphs and latent semantic imputation to update word embeddings for rare and emerging scientific terms without retraining models.

Findings

01

LSI effectively imputes embeddings for rare biomedical terms

02

Imputed embeddings improve domain-specific word similarity performance

03

Method preserves original embedding quality for common terms

Abstract

The most interesting words in scientific texts will often be novel or rare. This presents a challenge for scientific word embedding models to determine quality embedding vectors for useful terms that are infrequent or newly emerging. We demonstrate how \gls{lsi} can address this problem by imputing embeddings for domain-specific words from up-to-date knowledge graphs while otherwise preserving the original word embedding model. We use the MeSH knowledge graph to impute embedding vectors for biomedical terminology without retraining and evaluate the resulting embedding model on a domain-specific word-pair similarity task. We show that LSI can produce reliable embedding vectors for rare and OOV terms in the biomedical domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques