Models In a Spelling Bee: Language Models Implicitly Learn the Character   Composition of Tokens

Itay Itzhak; Omer Levy

arXiv:2108.11193·cs.CL·June 9, 2022

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

Itay Itzhak, Omer Levy

PDF

Open Access 1 Repo

TL;DR

Pretrained language models implicitly learn the character composition of tokens, enabling them to spell words without explicit character-level training, and adding spelling information does not significantly improve their performance.

Contribution

This study reveals that language models inherently acquire character-level knowledge of tokens, challenging assumptions about the need for explicit spelling training.

Findings

01

Models can spell up to a third of the vocabulary accurately.

02

High character ngram overlap across token types.

03

Adding explicit spelling information does not improve model performance.

Abstract

Standard pretrained language models operate on sequences of subword tokens without direct access to the characters that compose each token's string representation. We probe the embedding layer of pretrained language models and show that models learn the internal character composition of whole word and subword tokens to a surprising extent, without ever seeing the characters coupled with the tokens. Our results show that the embedding layer of RoBERTa holds enough information to accurately spell up to a third of the vocabulary and reach high average character ngram overlap on all token types. We further test whether enriching subword models with additional character information can improve language modeling, and observe that this method has a near-identical learning curve as training without spelling-based enrichment. Overall, our results suggest that language modeling objectives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

itay1itzhak/spellingbee
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsAttention Is All You Need · Linear Layer · Attention Dropout · Dense Connections · Dropout · Weight Decay · Residual Connection · Multi-Head Attention · Adam · Softmax