Matrix Factorization using Window Sampling and Negative Sampling for   Improved Word Representations

Alexandre Salle; Marco Idiart; Aline Villavicencio

arXiv:1606.00819·cs.CL·June 8, 2016

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

Alexandre Salle, Marco Idiart, Aline Villavicencio

PDF

1 Repo

TL;DR

This paper introduces LexVec, a novel word embedding method that employs weighted matrix factorization with negative sampling, achieving competitive or superior performance on word similarity and analogy benchmarks.

Contribution

LexVec is a new approach that combines low-rank weighted factorization with negative sampling for improved word representations.

Findings

01

LexVec matches or outperforms existing methods on similarity tasks.

02

LexVec effectively captures semantic relationships in word embeddings.

03

The method demonstrates robustness across multiple evaluation benchmarks.

Abstract

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexandres/lexvec
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.