Language Model Metrics and Procrustes Analysis for Improved Vector Transformation of NLP Embeddings
Thomas Conley, Jugal Kalita

TL;DR
This paper introduces a novel metric called Language Model Distance (LMD) for evaluating NLP embedding transformations, demonstrating its effectiveness through bilingual word mapping experiments using Procrustes analysis.
Contribution
The paper proposes LMD as a new way to measure linguistic similarity in embeddings, aligning vector transformation evaluation with language model understanding.
Findings
LMD outperforms traditional distance metrics in evaluating embedding transformations.
Applying LMD to Procrustes-based bilingual mapping improves accuracy.
The method bridges the gap between mathematical and linguistic measures of similarity.
Abstract
Artificial Neural networks are mathematical models at their core. This truismpresents some fundamental difficulty when networks are tasked with Natural Language Processing. A key problem lies in measuring the similarity or distance among vectors in NLP embedding space, since the mathematical concept of distance does not always agree with the linguistic concept. We suggest that the best way to measure linguistic distance among vectors is by employing the Language Model (LM) that created them. We introduce Language Model Distance (LMD) for measuring accuracy of vector transformations based on the Distributional Hypothesis ( LMD Accuracy ). We show the efficacy of this metric by applying it to a simple neural network learning the Procrustes algorithm for bilingual word mapping.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational Physics and Python Applications
MethodsProcrustes
