Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue   Estimates

Raj Patel; Carlotta Domeniconi

arXiv:1910.10491·cs.CL·October 24, 2019

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Raj Patel, Carlotta Domeniconi

PDF

TL;DR

This paper introduces Estimator Vectors, a neural network model that jointly learns word, subword, and context representations to improve out-of-vocabulary word embeddings, outperforming existing methods.

Contribution

The novel Estimator Vectors model combines subword and context clues to enhance OOV word embedding quality, addressing a key limitation of prior models.

Findings

01

Enriched word vectors through joint learning of multiple representations

02

Strong estimates for OOV words outperform existing methods

03

Model is competitive with state-of-the-art OOV estimation techniques

Abstract

Semantic representations of words have been successfully extracted from unlabeled corpuses using neural network models like word2vec. These representations are generally high quality and are computationally inexpensive to train, making them popular. However, these approaches generally fail to approximate out of vocabulary (OOV) words, a task humans can do quite easily, using word roots and context clues. This paper proposes a neural network model that learns high quality word representations, subword representations, and context clue representations jointly. Learning all three types of representations together enhances the learning of each, leading to enriched word vectors, along with strong estimates for OOV words, via the combination of the corresponding context clue and subword embeddings. Our model, called Estimator Vectors (EV), learns strong word embeddings and is competitive with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.