What do tokens know about their characters and how do they know it?

Ayush Kaushal; Kyle Mahowald

arXiv:2206.02608·cs.CL·June 7, 2022·1 cites

What do tokens know about their characters and how do they know it?

Ayush Kaushal, Kyle Mahowald

PDF

Open Access 1 Repo

TL;DR

Pre-trained language models encode detailed character-level information within their token embeddings, which can be probed and analyzed across multiple languages and model sizes, revealing mechanisms of knowledge acquisition during training.

Contribution

This study systematically probes how pre-trained models encode character information, demonstrating their ability to predict character presence and analyzing the mechanisms behind this knowledge acquisition.

Findings

01

Models encode character information robustly across languages.

02

Larger models perform better at encoding character details.

03

Character knowledge is acquired through multiple phenomena during training.

Abstract

Pre-trained language models (PLMs) that use subword tokenization schemes can succeed at a variety of language tasks that require character-level information, despite lacking explicit access to the character composition of tokens. Here, studying a range of models (e.g., GPT- J, BERT, RoBERTa, GloVe), we probe what word pieces encode about character-level information by training classifiers to predict the presence or absence of a particular alphabetical character in a token, based on its embedding (e.g., probing whether the model embedding for "cat" encodes that it contains the character "a"). We find that these models robustly encode character-level information and, in general, larger models perform better at the task. We show that these results generalize to characters from non-Latin alphabets (Arabic, Devanagari, and Cyrillic). Then, through a series of experiments and analyses, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ayushk4/character-probing-pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Softmax · Layer Normalization · Attention Dropout · WordPiece · Adam