An Ensemble Method to Produce High-Quality Word Embeddings (2016)

Robyn Speer; Joshua Chin

arXiv:1604.01692·cs.CL·December 20, 2019·49 cites

An Ensemble Method to Produce High-Quality Word Embeddings (2016)

Robyn Speer, Joshua Chin

PDF

Open Access 1 Repo

TL;DR

This paper introduces an ensemble approach that combines multiple word embedding methods and semantic networks to produce high-quality, multilingual word representations that outperform previous models on various similarity tasks.

Contribution

It presents a novel ensemble method integrating GloVe, word2vec, ConceptNet, and PPDB into a unified multilingual embedding space, achieving state-of-the-art results.

Findings

01

Achieves state-of-the-art performance on word similarity benchmarks

02

Improves rare word similarity scores by 16% over previous best

03

Produces high-quality, multilingual word embeddings

Abstract

A currently successful approach to computational semantics is to represent words as embeddings in a machine-learned vector space. We present an ensemble method that combines embeddings produced by GloVe (Pennington et al., 2014) and word2vec (Mikolov et al., 2013) with structured knowledge from the semantic networks ConceptNet (Speer and Havasi, 2012) and PPDB (Ganitkevitch et al., 2013), merging their information into a common representation with a large, multilingual vocabulary. The embeddings it produces achieve state-of-the-art performance on many word-similarity evaluations. Its score of $ρ = .596$ on an evaluation of rare words (Luong et al., 2013) is 16% higher than the previous best known system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LuminosoInsight/conceptnet-vector-ensemble
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies

MethodsGloVe Embeddings