Attention Word Embedding

Shashank Sonkar; Andrew E. Waters; Richard G. Baraniuk

arXiv:2006.00988·cs.CL·June 2, 2020

Attention Word Embedding

Shashank Sonkar, Andrew E. Waters, Richard G. Baraniuk

PDF

TL;DR

This paper introduces Attention Word Embedding (AWE), an improved word embedding model that uses attention mechanisms to weigh context words differently, enhancing performance over existing models.

Contribution

The paper proposes AWE and AWE-S models that incorporate attention and subword information into CBOW, improving word embeddings for NLP tasks.

Findings

01

AWE outperforms state-of-the-art models on word similarity datasets.

02

AWE and AWE-S improve NLP model initialization.

03

Attention mechanism enhances context word weighting.

Abstract

Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.