TL;DR
This paper introduces LexVec, a novel word embedding method that employs weighted matrix factorization with negative sampling, achieving competitive or superior performance on word similarity and analogy benchmarks.
Contribution
LexVec is a new approach that combines low-rank weighted factorization with negative sampling for improved word representations.
Findings
LexVec matches or outperforms existing methods on similarity tasks.
LexVec effectively captures semantic relationships in word embeddings.
The method demonstrates robustness across multiple evaluation benchmarks.
Abstract
In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
