Understanding Higher-Order Correlations Among Semantic Components in   Embeddings

Momose Oyama; Hiroaki Yamagiwa; Hidetoshi Shimodaira

arXiv:2409.19919·cs.CL·October 10, 2024

Understanding Higher-Order Correlations Among Semantic Components in Embeddings

Momose Oyama, Hiroaki Yamagiwa, Hidetoshi Shimodaira

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the limitations of ICA in interpreting semantic components of embeddings, revealing persistent higher-order correlations that indicate semantic associations and shared meanings among components.

Contribution

It quantifies higher-order correlations among ICA-derived semantic components, providing a novel visualization of their non-independencies and deeper insights into embedding structures.

Findings

01

Higher-order correlations indicate strong semantic associations.

02

Non-independencies reveal shared meanings among components.

03

Visualization via maximum spanning tree illustrates component relationships.

Abstract

Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using a maximum spanning tree of semantic components. These findings provide deeper insights into embeddings through ICA.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

momoseoyama/hoc
noneOfficial

Videos

Understanding Higher-Order Correlations Among Semantic Components in Embeddings· underline

Taxonomy

TopicsSemantic Web and Ontologies

MethodsIndependent Component Analysis