Improve Lexicon-based Word Embeddings By Word Sense Disambiguation

Yuanzhi Ke; Masafumi Hagiwara

arXiv:1707.07628·cs.CL·July 25, 2017

Improve Lexicon-based Word Embeddings By Word Sense Disambiguation

Yuanzhi Ke, Masafumi Hagiwara

PDF

Open Access

TL;DR

This paper introduces a novel lexicon-based word embedding method that incorporates word sense disambiguation to improve embeddings for polysemous words and enhances performance in text classification tasks.

Contribution

It proposes a new approach that considers the relatedness and differences between corpus and lexicon, using sense disambiguation to refine embeddings for polysemous words.

Findings

01

Improved embeddings for polysemous words.

02

Enhanced text classification performance.

03

Outperformed prior methods in word similarity tasks.

Abstract

There have been some works that learn a lexicon together with the corpus to improve the word embeddings. However, they either model the lexicon separately but update the neural networks for both the corpus and the lexicon by the same likelihood, or minimize the distance between all of the synonym pairs in the lexicon. Such methods do not consider the relatedness and difference of the corpus and the lexicon, and may not be the best optimized. In this paper, we propose a novel method that considers the relatedness and difference of the corpus and the lexicon. It trains word embeddings by learning the corpus to predicate a word and its corresponding synonym under the context at the same time. For polysemous words, we use a word sense disambiguation filter to eliminate the synonyms that have different meanings for the context. To evaluate the proposed method, we compare the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies