Toward Incorporation of Relevant Documents in word2vec
Navid Rekabsaz, Bhaskar Mitra, Mihai Lupu, Allan Hanbury

TL;DR
This paper proposes a neural-based explicit word representation method inspired by word2vec, which improves interpretability and incorporates local document information to enhance word similarity tasks in information retrieval.
Contribution
It introduces a new explicit word embedding method that maintains effectiveness while enabling the integration of local document information for IR tasks.
Findings
The proposed explicit representation outperforms existing explicit methods in word similarity ranking.
The method retains the effectiveness of the Skip-Gram model while offering interpretability.
Initial results show promising integration of local document information into global embeddings.
Abstract
Recent advances in neural word embedding provide significant benefit to various information retrieval tasks. However as shown by recent studies, adapting the embedding models for the needs of IR tasks can bring considerable further improvements. The embedding models in general define the term relatedness by exploiting the terms' co-occurrences in short-window contexts. An alternative (and well-studied) approach in IR for related terms to a query is using local information i.e. a set of top-retrieved documents. In view of these two methods of term relatedness, in this work, we report our study on incorporating the local information of the query in the word embeddings. One main challenge in this direction is that the dense vectors of word embeddings and their estimation of term-to-term relatedness remain difficult to interpret and hard to analyze. As an alternative, explicit word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
