Language Model Metrics and Procrustes Analysis for Improved Vector   Transformation of NLP Embeddings

Thomas Conley; Jugal Kalita

arXiv:2106.02490·cs.CL·June 7, 2021

Language Model Metrics and Procrustes Analysis for Improved Vector Transformation of NLP Embeddings

Thomas Conley, Jugal Kalita

PDF

Open Access

TL;DR

This paper introduces a novel metric called Language Model Distance (LMD) for evaluating NLP embedding transformations, demonstrating its effectiveness through bilingual word mapping experiments using Procrustes analysis.

Contribution

The paper proposes LMD as a new way to measure linguistic similarity in embeddings, aligning vector transformation evaluation with language model understanding.

Findings

01

LMD outperforms traditional distance metrics in evaluating embedding transformations.

02

Applying LMD to Procrustes-based bilingual mapping improves accuracy.

03

The method bridges the gap between mathematical and linguistic measures of similarity.

Abstract

Artificial Neural networks are mathematical models at their core. This truismpresents some fundamental difficulty when networks are tasked with Natural Language Processing. A key problem lies in measuring the similarity or distance among vectors in NLP embedding space, since the mathematical concept of distance does not always agree with the linguistic concept. We suggest that the best way to measure linguistic distance among vectors is by employing the Language Model (LM) that created them. We introduce Language Model Distance (LMD) for measuring accuracy of vector transformations based on the Distributional Hypothesis ( LMD Accuracy ). We show the efficacy of this metric by applying it to a simple neural network learning the Procrustes algorithm for bilingual word mapping.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Computational Physics and Python Applications

MethodsProcrustes