Detecting Conceptual Abstraction in LLMs
Michaela Regneri, Alhassan Abdelhalim, S\"oren Laue

TL;DR
This paper introduces a method to detect noun abstraction in large language models by analyzing attention patterns, revealing insights into how LLMs understand hierarchical concepts beyond mere distributional similarity.
Contribution
It presents a novel approach using attention analysis and counterfactuals to identify hypernymy, advancing explainability of conceptual abstraction in LLMs.
Findings
Detected hypernymy through attention matrices
Distinguished abstraction from distributional similarity
First step towards explainability of conceptual abstraction
Abstract
We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Weight Decay · Linear Layer · Adam · Linear Warmup With Linear Decay · Layer Normalization · Multi-Head Attention · Dropout
