# Word Embedding based on Low-Rank Doubly Stochastic Matrix Decomposition

**Authors:** Denis Sedov, Zhirong Yang

arXiv: 1812.10401 · 2018-12-27

## TL;DR

This paper introduces a novel word embedding method that optimizes similarity in the embedding space by directly learning an embedding simplex through a two-step random walk process, improving topic revelation and query performance.

## Contribution

It presents a new neighbor embedding technique based on low-rank doubly stochastic matrix decomposition that directly optimizes word similarities and better captures topics.

## Key findings

- Outperforms existing methods in query tasks
- Better reveals underlying topics among words
- Enhances similarity optimization in embeddings

## Abstract

Word embedding, which encodes words into vectors, is an important starting point in natural language processing and commonly used in many text-based machine learning tasks. However, in most current word embedding approaches, the similarity in embedding space is not optimized in the learning. In this paper we propose a novel neighbor embedding method which directly learns an embedding simplex where the similarities between the mapped words are optimal in terms of minimal discrepancy to the input neighborhoods. Our method is built upon two-step random walks between words via topics and thus able to better reveal the topics among the words. Experiment results indicate that our method, compared with another existing word embedding approach, is more favorable for various queries.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.10401/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1812.10401/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1812.10401/full.md

---
Source: https://tomesphere.com/paper/1812.10401