Toward Incorporation of Relevant Documents in word2vec

Navid Rekabsaz; Bhaskar Mitra; Mihai Lupu; Allan Hanbury

arXiv:1707.06598·cs.IR·April 5, 2018·6 cites

Toward Incorporation of Relevant Documents in word2vec

Navid Rekabsaz, Bhaskar Mitra, Mihai Lupu, Allan Hanbury

PDF

Open Access

TL;DR

This paper proposes a neural-based explicit word representation method inspired by word2vec, which improves interpretability and incorporates local document information to enhance word similarity tasks in information retrieval.

Contribution

It introduces a new explicit word embedding method that maintains effectiveness while enabling the integration of local document information for IR tasks.

Findings

01

The proposed explicit representation outperforms existing explicit methods in word similarity ranking.

02

The method retains the effectiveness of the Skip-Gram model while offering interpretability.

03

Initial results show promising integration of local document information into global embeddings.

Abstract

Recent advances in neural word embedding provide significant benefit to various information retrieval tasks. However as shown by recent studies, adapting the embedding models for the needs of IR tasks can bring considerable further improvements. The embedding models in general define the term relatedness by exploiting the terms' co-occurrences in short-window contexts. An alternative (and well-studied) approach in IR for related terms to a query is using local information i.e. a set of top-retrieved documents. In view of these two methods of term relatedness, in this work, we report our study on incorporating the local information of the query in the word embeddings. One main challenge in this direction is that the dense vectors of word embeddings and their estimation of term-to-term relatedness remain difficult to interpret and hard to analyze. As an alternative, explicit word…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques