Bridging the Gap: Incorporating a Semantic Similarity Measure for Effectively Mapping PubMed Queries to Documents
Sun Kim, Nicolas Fiorini, W. John Wilbur, Zhiyong Lu

TL;DR
This paper introduces a neural embedding-based semantic similarity measure for PubMed query-document matching, outperforming traditional IR methods and enhancing ranking accuracy when combined with BM25.
Contribution
It proposes a novel Word Mover's Distance-inspired similarity measure using neural embeddings, improving retrieval performance in biomedical literature search.
Findings
Outperforms BM25 by 12% in mean average precision on TREC Genomics data.
Combining the semantic measure with BM25 improves ranking scores by up to 25%.
The method is efficient and easy to implement.
Abstract
The main approach of traditional information retrieval (IR) is to examine how many words from a query appear in a document. A drawback of this approach, however, is that it may fail to detect relevant documents where no or only few words from a query are found. The semantic analysis methods such as LSA (latent semantic analysis) and LDA (latent Dirichlet allocation) have been proposed to address the issue, but their performance is not superior compared to common IR approaches. Here we present a query-document similarity measure motivated by the Word Mover's Distance. Unlike other similarity measures, the proposed method relies on neural word embeddings to compute the distance between words. This process helps identify related words when no direct matches are found between a query and a document. Our method is efficient and straightforward to implement. The experimental results on TREC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Discriminant Analysis
