Novel Ranking-Based Lexical Similarity Measure for Word Embedding

Jakub Dutkiewicz; Czes{\l}aw J\k{e}drzejek

arXiv:1712.08439·cs.CL·December 25, 2017

Novel Ranking-Based Lexical Similarity Measure for Word Embedding

Jakub Dutkiewicz, Czes{\l}aw J\k{e}drzejek

PDF

Open Access

TL;DR

This paper introduces a new ranking-based lexical similarity measure for word embeddings that improves semantic similarity tasks by refining vector comparisons and incorporating relational knowledge, outperforming existing methods.

Contribution

It proposes a novel ranking similarity measure and post-processing techniques that enhance the quality of word embeddings for semantic tasks.

Findings

01

Outperforms current literature on ESL and TOEFL datasets

02

Enrichments significantly improve semantic similarity results

03

Applicable to biological sequence representations in genomics

Abstract

Distributional semantics models derive word space from linguistic items in context. Meaning is obtained by defining a distance measure between vectors corresponding to lexical entities. Such vectors present several problems. In this paper we provide a guideline for post process improvements to the baseline vectors. We focus on refining the similarity aspect, address imperfections of the model by applying the hubness reduction method, implementing relational knowledge into the model, and providing a new ranking similarity definition that give maximum weight to the top 1 component value. This feature ranking is similar to the one used in information retrieval. All these enrichments outperform current literature results for joint ESL and TOEF sets comparison. Since single word embedding is a basic element of any semantic task one can expect a significant improvement of results for these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Machine Learning in Bioinformatics · Topic Modeling