Using Word Embeddings for Automatic Query Expansion
Dwaipayan Roy, Debjyoti Paul, Mandar Mitra, Utpal Garain

TL;DR
This paper introduces a neural embedding-based framework for automatic query expansion using word2vec, demonstrating improved retrieval performance over standard methods but less effective than statistical feedback techniques.
Contribution
Proposes a novel unsupervised query expansion method using word2vec embeddings and K-nearest neighbors, with comprehensive experimental evaluation.
Findings
Significant improvement over standard term-overlapping retrieval methods
Performs similarly with and without feedback information
Less effective than statistical co-occurrence based feedback methods like RM3
Abstract
In this paper a framework for Automatic Query Expansion (AQE) is proposed using distributed neural language model word2vec. Using semantic and contextual relation in a distributed and unsupervised framework, word2vec learns a low dimensional embedding for each vocabulary entry. Using such a framework, we devise a query expansion technique, where related terms to a query are obtained by K-nearest neighbor approach. We explore the performance of the AQE methods, with and without feedback query expansion, and a variant of simple K-nearest neighbor in the proposed framework. Experiments on standard TREC ad-hoc data (Disk 4, 5 with query sets 301-450, 601-700) and web data (WT10G data with query set 451-550) shows significant improvement over standard term-overlapping based retrieval methods. However the proposed method fails to achieve comparable performance with statistical co-occurrence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Web Data Mining and Analysis · Semantic Web and Ontologies
