Multilingual LLMs Struggle to Link Orthography and Semantics in Bilingual Word Processing
Eshaan Tanwar, Gayatri Oke, Tanmoy Chakraborty

TL;DR
This paper evaluates multilingual LLMs' ability to process bilingual words, revealing they struggle with semantic disambiguation of interlingual homographs and rely heavily on orthography rather than meaning.
Contribution
It provides a detailed analysis of how multilingual LLMs handle cross-lingual lexical phenomena, highlighting their limitations in semantic disambiguation of interlingual homographs.
Findings
LLMs perform well on cognates and non-cognates in isolation
LLMs struggle with interlingual homographs, often below random chance
Models rely more on orthography than semantic context
Abstract
Bilingual lexical processing is shaped by the complex interplay of phonological, orthographic, and semantic features of two languages within an integrated mental lexicon. In humans, this is evident in the ease with which cognate words - words similar in both orthographic form and meaning (e.g., blind, meaning "sightless" in both English and German) - are processed, compared to the challenges posed by interlingual homographs, which share orthographic form but differ in meaning (e.g., gift, meaning "present" in English but "poison" in German). We investigate how multilingual Large Language Models (LLMs) handle such phenomena, focusing on English-Spanish, English-French, and English-German cognates, non-cognate, and interlingual homographs. Specifically, we evaluate their ability to disambiguate meanings and make semantic judgments, both when these word types are presented in isolation or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Lexicography and Language Studies
MethodsOPT
