A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution
Shaohua Li, Jun Zhu, Chunyan Miao

TL;DR
This paper introduces a generative word embedding model that incorporates latent factors and offers an interpretable approach, outperforming traditional matrix factorization methods and rivaling neural embedding models on benchmark datasets.
Contribution
The paper proposes a novel generative word embedding model based on a low rank positive semidefinite solution, providing interpretability and scalability.
Findings
Competitive with word2vec on benchmarks
Outperforms other matrix factorization methods
Scalable optimization via eigendecomposition
Abstract
Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using Singular Value Decomposition (SVD), may incur loss of corpus information. In addition, it is desirable to incorporate global latent factors, such as topics, sentiments or writing styles, into the word embedding model. Since generative models provide a principled way to incorporate latent factors, we propose a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models. The model inference reduces to a low rank weighted positive semidefinite approximation problem. Its optimization is approached by eigendecomposition on a submatrix, followed by online blockwise regression, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
