Neural sentence embedding models for semantic similarity estimation in   the biomedical domain

Kathrin Blagec; Hong Xu; Asan Agibetov; Matthias Samwald

arXiv:2110.15708·cs.CL·November 1, 2021

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

Kathrin Blagec, Hong Xu, Asan Agibetov, Matthias Samwald

PDF

1 Repo

TL;DR

This paper evaluates neural sentence embedding models for biomedical semantic similarity, demonstrating that they can outperform ontology-dependent methods on benchmark datasets, though challenges remain in detecting contradictions.

Contribution

It introduces neural embedding models trained on biomedical literature that surpass previous state-of-the-art methods for semantic similarity estimation.

Findings

01

Best unsupervised model achieved Pearson r=0.819

02

Supervised model combining string metrics and embeddings reached r=0.871

03

Models struggled with contradiction detection in biomedical sentences

Abstract

BACKGROUND: In this study, we investigated the efficacy of current state-of-the-art neural sentence embedding models for semantic similarity estimation of sentences from biomedical literature. We trained different neural embedding models on 1.7 million articles from the PubMed Open Access dataset, and evaluated them based on a biomedical benchmark set containing 100 sentence pairs annotated by human experts and a smaller contradiction subset derived from the original benchmark set. RESULTS: With a Pearson correlation of 0.819, our best unsupervised model based on the Paragraph Vector Distributed Memory algorithm outperforms previous state-of-the-art results achieved on the BIOSSES biomedical benchmark set. Moreover, our proposed supervised model that combines different string-based similarity metrics with a neural embedding model surpasses previous ontology-dependent supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kathrinblagec/neural-sentence-embedding-models-for-biomedical-applications
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.