TL;DR
This paper evaluates neural sentence embedding models for biomedical semantic similarity, demonstrating that they can outperform ontology-dependent methods on benchmark datasets, though challenges remain in detecting contradictions.
Contribution
It introduces neural embedding models trained on biomedical literature that surpass previous state-of-the-art methods for semantic similarity estimation.
Findings
Best unsupervised model achieved Pearson r=0.819
Supervised model combining string metrics and embeddings reached r=0.871
Models struggled with contradiction detection in biomedical sentences
Abstract
BACKGROUND: In this study, we investigated the efficacy of current state-of-the-art neural sentence embedding models for semantic similarity estimation of sentences from biomedical literature. We trained different neural embedding models on 1.7 million articles from the PubMed Open Access dataset, and evaluated them based on a biomedical benchmark set containing 100 sentence pairs annotated by human experts and a smaller contradiction subset derived from the original benchmark set. RESULTS: With a Pearson correlation of 0.819, our best unsupervised model based on the Paragraph Vector Distributed Memory algorithm outperforms previous state-of-the-art results achieved on the BIOSSES biomedical benchmark set. Moreover, our proposed supervised model that combines different string-based similarity metrics with a neural embedding model surpasses previous ontology-dependent supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
