TL;DR
VAST is an intrinsic evaluation method for contextualized word embeddings that measures how well models encode word semantics related to pleasantness, revealing insights into contextualization, tokenization, and bias masking in language models.
Contribution
The paper introduces VAST, a new intrinsic evaluation task for CWEs that assesses semantic encoding related to valence, and demonstrates its effectiveness across multiple models and languages.
Findings
GPT-2's semantics incorporate context more in later layers.
Multiply tokenized words are semantically encoded starting at layer 8.
Removing non-semantic components improves semantic similarity scores.
Abstract
VAST, the Valence-Assessing Semantics Test, is a novel intrinsic evaluation task for contextualized word embeddings (CWEs). VAST uses valence, the association of a word with pleasantness, to measure the correspondence of word-level LM semantics with widely used human judgments, and examines the effects of contextualization, tokenization, and LM-specific geometry. Because prior research has found that CWEs from GPT-2 perform poorly on other intrinsic evaluations, we select GPT-2 as our primary subject, and include results showing that VAST is useful for 7 other LMs, and can be used in 7 languages. GPT-2 results show that the semantics of a word incorporate the semantics of context in layers closer to model output, such that VAST scores diverge between our contextual settings, ranging from Pearson's rho of .55 to .77 in layer 11. We also show that multiply tokenized words are not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Adam · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Softmax · Dense Connections · Residual Connection
