Contextual Skipgram: Training Word Representation Using Context   Information

Dongjae Kim; Jong-Kook Kim

arXiv:2102.08565·cs.CL·February 18, 2021·1 cites

Contextual Skipgram: Training Word Representation Using Context Information

Dongjae Kim, Jong-Kook Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces Contextual Skip-gram, an improved word embedding model that uses context information to focus on relevant words, leading to better quality representations by reducing noise from irrelevant context words.

Contribution

It proposes a novel extension of the skip-gram model that incorporates context information to improve word embedding quality.

Findings

01

Enhanced word representations with better semantic accuracy

02

Reduced influence of irrelevant context words during training

03

Improved performance on downstream NLP tasks

Abstract

The skip-gram (SG) model learns word representation by predicting the words surrounding a center word from unstructured text data. However, not all words in the context window contribute to the meaning of the center word. For example, less relevant words could be in the context window, hindering the SG model from learning a better quality representation. In this paper, we propose an enhanced version of the SG that leverages context information to produce word representation. The proposed model, Contextual Skip-gram, is designed to predict contextual words with both the center words and the context information. This simple idea helps to reduce the impact of irrelevant words on the training process, thus enhancing the final performance

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

harshvivek14/NLP-Word-Embedding-Techniques
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems