An Ensemble Method to Produce High-Quality Word Embeddings (2016)
Robyn Speer, Joshua Chin

TL;DR
This paper introduces an ensemble approach that combines multiple word embedding methods and semantic networks to produce high-quality, multilingual word representations that outperform previous models on various similarity tasks.
Contribution
It presents a novel ensemble method integrating GloVe, word2vec, ConceptNet, and PPDB into a unified multilingual embedding space, achieving state-of-the-art results.
Findings
Achieves state-of-the-art performance on word similarity benchmarks
Improves rare word similarity scores by 16% over previous best
Produces high-quality, multilingual word embeddings
Abstract
A currently successful approach to computational semantics is to represent words as embeddings in a machine-learned vector space. We present an ensemble method that combines embeddings produced by GloVe (Pennington et al., 2014) and word2vec (Mikolov et al., 2013) with structured knowledge from the semantic networks ConceptNet (Speer and Havasi, 2012) and PPDB (Ganitkevitch et al., 2013), merging their information into a common representation with a large, multilingual vocabulary. The embeddings it produces achieve state-of-the-art performance on many word-similarity evaluations. Its score of on an evaluation of rare words (Luong et al., 2013) is 16% higher than the previous best known system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsGloVe Embeddings
