Interpreting Embedding Spaces by Conceptualization
Adi Simhi, Shaul Markovitch

TL;DR
This paper introduces a novel method to interpret embedding spaces by transforming them into comprehensible conceptual spaces, enabling better understanding, debugging, and bias detection in large language models.
Contribution
The paper presents an algorithm for deriving a dynamic, human-understandable conceptual space from latent embeddings, along with an evaluation method using human and LLM raters.
Findings
Conceptualized vectors accurately represent original semantics.
The method enables comparison of different models' semantics.
It allows tracing of LLM layer semantics.
Abstract
One of the main methods for computational interpretation of a text is mapping it into a vector in some embedding space. Such vectors can then be used for a variety of textual processing tasks. Recently, most embedding spaces are a product of training large language models (LLMs). One major drawback of this type of representation is their incomprehensibility to humans. Understanding the embedding space is crucial for several important needs, including the need to debug the embedding method and compare it to alternatives, and the need to detect biases hidden in the model. In this paper, we present a novel method of understanding embeddings by transforming a latent embedding space into a comprehensible conceptual space. We present an algorithm for deriving a conceptual space with dynamic on-demand granularity. We devise a new evaluation method, using either human rater or LLM-based raters,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
