Learning Meta Word Embeddings by Unsupervised Weighted Concatenation of Source Embeddings
Danushka Bollegala

TL;DR
This paper introduces unsupervised weighted concatenation methods for meta word embeddings, theoretically analyzing and empirically demonstrating their superiority over previous approaches using multiple benchmarks.
Contribution
It provides a theoretical framework for weighted concatenation as spectrum matching and proposes unsupervised methods to optimize these weights for improved meta-embeddings.
Findings
Weighted concatenation aligns with spectrum matching principles
Proposed methods outperform previous meta-embedding techniques
Achieves better accuracy on multiple benchmark datasets
Abstract
Given multiple source word embeddings learnt using diverse algorithms and lexical resources, meta word embedding learning methods attempt to learn more accurate and wide-coverage word embeddings. Prior work on meta-embedding has repeatedly discovered that simple vector concatenation of the source embeddings to be a competitive baseline. However, it remains unclear as to why and when simple vector concatenation can produce accurate meta-embeddings. We show that weighted concatenation can be seen as a spectrum matching operation between each source embedding and the meta-embedding, minimising the pairwise inner-product loss. Following this theoretical analysis, we propose two \emph{unsupervised} methods to learn the optimal concatenation weights for creating meta-embeddings from a given set of source embeddings. Experimental results on multiple benchmark datasets show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
