Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in   Practice

Andreas Grivas; Nikolay Bogoychev; Adam Lopez

arXiv:2203.06462·cs.LG·March 22, 2022

Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice

Andreas Grivas, Nikolay Bogoychev, Adam Lopez

PDF

1 Repo

TL;DR

This paper investigates the theoretical possibility of unargmaxable classes in low-rank softmax layers of NLP models, finds they are rare in large models, and provides algorithms to detect such tokens.

Contribution

It introduces algorithms to identify unargmaxable tokens in large language and translation models, demonstrating their rarity in practical settings.

Findings

01

13 out of 150 models have unargmaxable tokens

02

Unargmaxable tokens are very infrequent in large models

03

Detection algorithms are publicly released

Abstract

Classifiers in natural language processing (NLP) often have a large number of output classes. For example, neural language models (LMs) and machine translation (MT) models both predict tokens from a vocabulary of thousands. The Softmax output layer of these models typically receives as input a dense feature representation, which has much lower dimensionality than the output. In theory, the result is some words may be impossible to be predicted via argmax, irrespective of input features, and empirically, there is evidence this happens in small language models. In this paper we ask whether it can happen in practical large language models and translation models. To do so, we develop algorithms to detect such \emph{unargmaxable} tokens in public models. We find that 13 out of 150 models do indeed have such tokens; however, they are very infrequent and unlikely to impact model quality. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andreasgrv/unargmaxable
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax