Extremal GloVe: Theoretically Accurate Distributed Word Embedding by   Tail Inference

Hao Wang

arXiv:2204.13009·cs.CL·April 28, 2022

Extremal GloVe: Theoretically Accurate Distributed Word Embedding by Tail Inference

Hao Wang

PDF

TL;DR

This paper introduces a theoretically grounded version of GloVe, leveraging extreme value analysis to improve the selection of the weighting function and exponent, resulting in more accurate word embeddings.

Contribution

It reformulates GloVe's loss function using extreme value theory and optimally chooses parameters for enhanced theoretical soundness.

Findings

01

The new GloVe version is competitive with existing embeddings.

02

Optimal parameters derived from theory improve embedding quality.

03

Initial GloVe formulation is a special case of the proposed method.

Abstract

Distributed word embeddings such as Word2Vec and GloVe have been widely adopted in industrial context settings. Major technical applications of GloVe include recommender systems and natural language processing. The fundamental theory behind GloVe relies on the selection of a weighting function in the weighted least squres formulation that computes the powered ratio of word occurrence count and the maximum word count in the corpus. However, the initial formulation of GloVe is not theoretically sound in two aspects, namely the selection of the weighting function and its power exponent is ad-hoc. In this paper, we utilize the theory of extreme value analysis and propose a theoretically accurate version of GloVe. By reformulating the weighted least squares loss function as the expected loss function and accurately choosing the power exponent, we create a theoretically accurate version of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGloVe Embeddings