Aligned at the Start: Conceptual Groupings in LLM Embeddings
Mehrdad Khatir, Sanchit Kabra, Chandan K. Reddy

TL;DR
This paper investigates the structure of input embeddings in large language models, revealing categorical communities aligned with human concepts, and demonstrates that manipulating these can reduce ethnicity bias.
Contribution
It introduces a novel analysis of LLM input embeddings using graph and community detection methods, uncovering fundamental conceptual groupings and their potential for bias mitigation.
Findings
Embeddings form significant categorical communities aligned with human concepts.
Cross-model embedding alignments show medium to high consistency.
Manipulating groupings can mitigate ethnicity bias in LLM tasks.
Abstract
This paper shifts focus to the often-overlooked input embeddings - the initial representations fed into transformer blocks. Using fuzzy graph, k-nearest neighbor (k-NN), and community detection, we analyze embeddings from diverse LLMs, finding significant categorical community structure aligned with predefined concepts and categories aligned with humans. We observe these groupings exhibit within-cluster organization (such as hierarchies, topological ordering, etc.), hypothesizing a fundamental structure that precedes contextual processing. To further investigate the conceptual nature of these groupings, we explore cross-model alignments across different LLM categories within their input embeddings, observing a medium to high degree of alignment. Furthermore, provide evidence that manipulating these groupings can play a functional role in mitigating ethnicity bias in LLM tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · SentencePiece · Gated Linear Unit · Adam · Attention Dropout · Dropout · Inverse Square Root Schedule · Adafactor
