Exploring Intra and Inter-language Consistency in Embeddings with ICA

Rongzhi Li; Takeru Matsuda; Hitomi Yanaka

arXiv:2406.12474·cs.CL·June 19, 2024

Exploring Intra and Inter-language Consistency in Embeddings with ICA

Rongzhi Li, Takeru Matsuda, Hitomi Yanaka

PDF

Open Access 1 Video

TL;DR

This paper investigates the consistency of semantic axes derived from word embeddings within and across languages using ICA, applying statistical methods to verify their reliability and universality.

Contribution

It introduces a robust framework employing statistical tests to verify intra- and inter-language consistency of ICA-derived semantic axes in word embeddings.

Findings

01

ICA reveals consistent semantic axes within languages.

02

ICA-derived axes show significant correspondence across languages.

03

Statistical methods confirm the universality of semantic axes.

Abstract

Word embeddings represent words as multidimensional real vectors, facilitating data analysis and processing, but are often challenging to interpret. Independent Component Analysis (ICA) creates clearer semantic axes by identifying independent key features. Previous research has shown ICA's potential to reveal universal semantic axes across languages. However, it lacked verification of the consistency of independent components within and across languages. We investigated the consistency of semantic axes in two ways: both within a single language and across multiple languages. We first probed into intra-language consistency, focusing on the reproducibility of axes by performing ICA multiple times and clustering the outcomes. Then, we statistically examined inter-language consistency by verifying those axes' correspondences using statistical tests. We newly applied statistical methods to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring Intra and Inter-language Consistency in Embeddings with ICA· underline

Taxonomy

TopicsNeural Networks and Applications

MethodsIndependent Component Analysis