A Dual Embedding Space Model for Document Ranking
Bhaskar Mitra, Eric Nalisnick, Nick Craswell, Rich Caruana

TL;DR
This paper introduces a Dual Embedding Space Model (DESM) that leverages neural word embeddings in both input and output spaces to improve document ranking by capturing richer semantic relationships, outperforming traditional term-matching methods in re-ranking tasks.
Contribution
The paper proposes the novel DESM approach that uses dual embedding spaces for better relevance scoring in document ranking, combining semantic evidence with traditional methods.
Findings
DESM improves re-ranking of top documents over TF-IDF.
Embedding-based ranking alone can produce false positives.
Combining DESM with word count features enhances ranking accuracy.
Abstract
A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Information Retrieval and Search Behavior
