Correlation and Navigation in the Vocabulary Key Representation Space of   Language Models

Letian Peng; Chenyang An; Jingbo Shang

arXiv:2410.02284·cs.CL·October 4, 2024

Correlation and Navigation in the Vocabulary Key Representation Space of Language Models

Letian Peng, Chenyang An, Jingbo Shang

PDF

1 Repo 2 Datasets

TL;DR

This paper investigates how the similarity of vocabulary keys in language models affects token prediction, revealing biases that hinder diversity and proposing an in-context method to improve navigation and generation quality.

Contribution

It introduces a novel in-context navigation method that reduces bias from key similarity, enhancing diversity and accuracy in language model decoding.

Findings

01

Top-ranked tokens are accurate, but middle-ranked are biased towards similar tokens.

02

The proposed method improves decoding diversity and reasoning performance.

03

Navigation away from explored keys enhances generation quality.

Abstract

Language model (LM) decoding is based on the next-token prediction (NTP) probability distribution. For neural LMs (e.g., Transformer-based), NTP distribution is essentially a softmax-regularized dot product between an encoded input context (query) and fixed vocabulary representations (keys). In this paper, we study the effect of the key distribution on the NTP distribution, with a focus on whether the similarity between keys will trigger spurious correlations in NTP. Through knowledge-probing tasks, we show that in the NTP distribution, the few top-ranked tokens are typically accurate. However, the middle-ranked prediction is highly biased towards the tokens that are distributionally (not necessarily semantically) similar to these top ones. For instance, if "P" is predicted as the top-1 token, "A"-"Z" will all be ranked high in NTP, no matter whether they can lead to correct decoding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KomeijiForce/KeyNavi
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus