Evaluating the impact of word embeddings on similarity scoring in practical information retrieval
Niall McCarroll, Kevin Curran, Eugene McNamee, Angela Clist, Andrew Brammer

TL;DR
This paper demonstrates that combining Word Movers Distance with pre-trained word embeddings significantly improves semantic similarity measurement in information retrieval, outperforming existing models across various datasets.
Contribution
It introduces a novel similarity evaluation approach using WMD with word embeddings, showing superior performance over traditional centroid-based methods.
Findings
WMD + GloVe outperforms Doc2Vec and LSA models.
Significant accuracy improvements in query-response ranking.
Pre-trained embeddings provide domain-agnostic, portable solutions.
Abstract
Search behaviour is characterised using synonymy and polysemy as users often want to search information based on meaning. Semantic representation strategies represent a move towards richer associative connections that can adequately capture this complex usage of language. Vector Space Modelling (VSM) and neural word embeddings play a crucial role in modern machine learning and Natural Language Processing (NLP) pipelines. Embeddings use distributional semantics to represent words, sentences, paragraphs or entire documents as vectors in high dimensional spaces. This can be leveraged by Information Retrieval (IR) systems to exploit the semantic relatedness between queries and answers. This paper evaluates an alternative approach to measuring query statement similarity that moves away from the common similarity measure of centroids of neural word embeddings. Motivated by the Word Movers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Advanced Graph Neural Networks
