Typoglycemia under the Hood: Investigating Language Models' Understanding of Scrambled Words
Gianluca Sperduti, Alejandro Moreo

TL;DR
This study investigates how language models, specifically BERT, understand scrambled words in English, revealing that their robustness to typoglycemia is due to limited word collapse and contextual disambiguation.
Contribution
The paper analyzes the impact of typoglycemia on language models, quantifies word collapse and ambiguity, and evaluates BERT's disambiguation ability with scrambled words.
Findings
Performance degradation due to scrambling is smaller than expected.
Few English words collapse under typoglycemia, aiding disambiguation.
Contextual cues help models distinguish scrambled words effectively.
Abstract
Research in linguistics has shown that humans can read words with internally scrambled letters, a phenomenon recently dubbed typoglycemia. Some specific NLP models have recently been proposed that similarly demonstrate robustness to such distortions by ignoring the internal order of characters by design. This raises a fundamental question: how can models perform well when many distinct words (e.g., form and from) collapse into identical representations under typoglycemia? Our work, focusing exclusively on the English language, seeks to shed light on the underlying aspects responsible for this robustness. We hypothesize that the main reasons have to do with the fact that (i) relatively few English words collapse under typoglycemia, and that (ii) collapsed words tend to occur in contexts so distinct that disambiguation becomes trivial. In our analysis, we (i) analyze the British National…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Language and cultural evolution
