Exploring Intra and Inter-language Consistency in Embeddings with ICA
Rongzhi Li, Takeru Matsuda, Hitomi Yanaka

TL;DR
This paper investigates the consistency of semantic axes derived from word embeddings within and across languages using ICA, applying statistical methods to verify their reliability and universality.
Contribution
It introduces a robust framework employing statistical tests to verify intra- and inter-language consistency of ICA-derived semantic axes in word embeddings.
Findings
ICA reveals consistent semantic axes within languages.
ICA-derived axes show significant correspondence across languages.
Statistical methods confirm the universality of semantic axes.
Abstract
Word embeddings represent words as multidimensional real vectors, facilitating data analysis and processing, but are often challenging to interpret. Independent Component Analysis (ICA) creates clearer semantic axes by identifying independent key features. Previous research has shown ICA's potential to reveal universal semantic axes across languages. However, it lacked verification of the consistency of independent components within and across languages. We investigated the consistency of semantic axes in two ways: both within a single language and across multiple languages. We first probed into intra-language consistency, focusing on the reproducibility of axes by performing ICA multiple times and clustering the outcomes. Then, we statistically examined inter-language consistency by verifying those axes' correspondences using statistical tests. We newly applied statistical methods to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsIndependent Component Analysis
