RPD: A Distance Function Between Word Embeddings
Xuhui Zhou, Zaixiang Zheng, Shujian Huang

TL;DR
This paper introduces RPD, a new metric to measure the distance between different word embedding spaces, helping to understand how various training methods and corpora influence embeddings.
Contribution
The paper proposes the Relative pairwise inner Product Distance (RPD), a novel unified metric for comparing and analyzing differences between sets of word embeddings.
Findings
RPD effectively quantifies differences between embedding spaces.
Different algorithms and training data significantly affect embedding relations.
RPD provides insights into the properties of various word embeddings.
Abstract
It is well-understood that different algorithms, training processes, and corpora produce different word embeddings. However, less is known about the relation between different embedding spaces, i.e. how far different sets of embeddings deviate from each other. In this paper, we propose a novel metric called Relative pairwise inner Product Distance (RPD) to quantify the distance between different sets of word embeddings. This metric has a unified scale for comparing different sets of word embeddings. Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora. The results shed light on the poorly understood word embeddings and justify RPD as a measure of the distance of embedding spaces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
