A data-driven strategy to combine word embeddings in information retrieval
Alfredo Silva, Marcelo Mendoza

TL;DR
This paper introduces a data-driven method for combining word embeddings to improve query representation in information retrieval, outperforming simple averaging techniques on benchmark datasets.
Contribution
It proposes a novel Idf-based combination strategy for word embeddings that enhances query descriptive capacity in ad-hoc retrieval tasks.
Findings
Idf-based embedding combinations outperform average embeddings
Experimental results on benchmark data validate the approach
Data-driven methods are promising for query representation
Abstract
Word embeddings are vital descriptors of words in unigram representations of documents for many tasks in natural language processing and information retrieval. The representation of queries has been one of the most critical challenges in this area because it consists of a few terms and has little descriptive capacity. Strategies such as average word embeddings can enrich the queries' descriptive capacity since they favor the identification of related terms from the continuous vector representations that characterize these approaches. We propose a data-driven strategy to combine word embeddings. We use Idf combinations of embeddings to represent queries, showing that these representations outperform the average word embeddings recently proposed in the literature. Experimental results on benchmark data show that our proposal performs well, suggesting that data-driven combinations of word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
