Iconicity in Large Language Models
Anna Marklov\'a, Ji\v{r}\'i Mili\v{c}ka, Leonid Ryvkin, \v{L}udmila Lackov\'a Bennet, Libu\v{s}e Korman\'ikov\'a

TL;DR
This paper investigates how large language models encode lexical iconicity by generating and testing pseudowords in artificial languages, revealing that LLMs can recognize iconicity more effectively than humans and that such encoding differs from natural language processing.
Contribution
It demonstrates that LLMs can generate and recognize iconic pseudowords in artificial languages, highlighting differences from human processing of iconicity.
Findings
Humans better guess meanings of pseudowords in iconic language than in natural languages.
LLMs outperform humans in guessing meanings of iconic pseudowords.
Generated languages show universality and shared cues used by humans and LLMs.
Abstract
Lexical iconicity, a direct relation between a word's meaning and its form, is an important aspect of every natural language, most commonly manifesting through sound-meaning associations. Since Large language models' (LLMs') access to both meaning and sound of text is only mediated (meaning through textual context, sound through written representation, further complicated by tokenization), we might expect that the encoding of iconicity in LLMs would be either insufficient or significantly different from human processing. This study addresses this hypothesis by having GPT-4 generate highly iconic pseudowords in artificial languages. To verify that these words actually carry iconicity, we had their meanings guessed by Czech and German participants (n=672) and subsequently by LLM-based participants (generated by GPT-4 and Claude 3.5 Sonnet). The results revealed that humans can guess the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Multisensory perception and integration · Categorization, perception, and language
MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
