word2vec Parameter Learning Explained

Xin Rong

arXiv:1411.2738·cs.CL·June 7, 2016·659 cites

word2vec Parameter Learning Explained

Xin Rong

PDF

Open Access 5 Repos

TL;DR

This paper provides a detailed, mathematically rigorous explanation of the parameter learning process in word2vec models, including derivations, interpretations, and an interactive demo to aid understanding.

Contribution

It offers the first comprehensive, detailed derivation and explanation of word2vec parameter updates, including advanced optimization techniques, for researchers without neural network expertise.

Findings

01

Clarifies the mathematical derivation of word2vec updates

02

Provides intuitive interpretations of gradient equations

03

Includes an interactive demo for better understanding

Abstract

The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector representations of words learned by word2vec models have been shown to carry semantic meanings and are useful in various NLP tasks. As an increasing number of researchers would like to experiment with word2vec or similar techniques, I notice that there lacks a material that comprehensively explains the parameter learning process of word embedding models in details, thus preventing researchers that are non-experts in neural networks from understanding the working mechanism of such models. This note provides detailed derivations and explanations of the parameter update equations of the word2vec models, including the original continuous bag-of-word (CBOW) and skip-gram (SG) models, as well as advanced optimization techniques, including hierarchical softmax and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsHierarchical Softmax · Softmax