A data-driven strategy to combine word embeddings in information   retrieval

Alfredo Silva; Marcelo Mendoza

arXiv:2105.12788·cs.IR·May 28, 2021

A data-driven strategy to combine word embeddings in information retrieval

Alfredo Silva, Marcelo Mendoza

PDF

Open Access

TL;DR

This paper introduces a data-driven method for combining word embeddings to improve query representation in information retrieval, outperforming simple averaging techniques on benchmark datasets.

Contribution

It proposes a novel Idf-based combination strategy for word embeddings that enhances query descriptive capacity in ad-hoc retrieval tasks.

Findings

01

Idf-based embedding combinations outperform average embeddings

02

Experimental results on benchmark data validate the approach

03

Data-driven methods are promising for query representation

Abstract

Word embeddings are vital descriptors of words in unigram representations of documents for many tasks in natural language processing and information retrieval. The representation of queries has been one of the most critical challenges in this area because it consists of a few terms and has little descriptive capacity. Strategies such as average word embeddings can enrich the queries' descriptive capacity since they favor the identification of related terms from the continuous vector representations that characterize these approaches. We propose a data-driven strategy to combine word embeddings. We use Idf combinations of embeddings to represent queries, showing that these representations outperform the average word embeddings recently proposed in the literature. Experimental results on benchmark data show that our proposal performs well, suggesting that data-driven combinations of word…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques