Word Meanings in Transformer Language Models

Jumbly Grindrod; Peter Grindrod

arXiv:2508.12863·cs.CL·August 19, 2025

Word Meanings in Transformer Language Models

Jumbly Grindrod, Peter Grindrod

PDF

Open Access

TL;DR

This paper explores how transformer language models encode word meanings by clustering token embeddings and analyzing their sensitivity to semantic and psycholinguistic features, revealing rich semantic information in the embeddings.

Contribution

It demonstrates that transformer models encode diverse semantic information in token embeddings, challenging meaning eliminativist hypotheses.

Findings

01

Token embedding clusters are sensitive to semantic features

02

Transformer models encode a wide range of psycholinguistic information

03

Results challenge theories that deny semantic content in LLM representations

Abstract

We investigate how word meanings are represented in the transformer language models. Specifically, we focus on whether transformer models employ something analogous to a lexical store - where each word has an entry that contains semantic information. To do this, we extracted the token embedding space of RoBERTa-base and k-means clustered it into 200 clusters. In our first study, we then manually inspected the resultant clusters to consider whether they are sensitive to semantic information. In our second study, we tested whether the clusters are sensitive to five psycholinguistic measures: valence, concreteness, iconicity, taboo, and age of acquisition. Overall, our findings were very positive - there is a wide variety of semantic information encoded within the token embedding space. This serves to rule out certain "meaning eliminativist" hypotheses about how transformer LLMs process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques