AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through   low-confidence single-token predictions

Walig\'ora Witold

arXiv:2406.19840·cs.CL·July 1, 2024

AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions

Walig\'ora Witold

PDF

Open Access 1 Repo

TL;DR

AnomaLLMy is a new method that detects anomalous tokens in black-box LLMs by analyzing low-confidence single-token predictions, improving model reliability and tokenizer development.

Contribution

It introduces a cost-effective approach for anomaly detection in black-box LLMs using low-confidence predictions, validated on GPT-4 token data.

Findings

01

Detected 413 major anomalies and 65 minor anomalies in GPT-4 tokens.

02

Achieved anomaly detection with only $24.39 in API credits.

03

Demonstrated effectiveness in improving LLM robustness.

Abstract

This paper introduces AnomaLLMy, a novel technique for the automatic detection of anomalous tokens in black-box Large Language Models (LLMs) with API-only access. Utilizing low-confidence single-token predictions as a cost-effective indicator, AnomaLLMy identifies irregularities in model behavior, addressing the issue of anomalous tokens degrading the quality and reliability of models. Validated on the cl100k_base dataset, the token set of GPT-4, AnomaLLMy detected 413 major and 65 minor anomalies, demonstrating the method's efficiency with just $24.39 spent in API credits. The insights from this research are expected to be beneficial for enhancing the robustness of and accuracy of LLMs, particularly in the development and assessment of tokenizers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wwa/anomallmy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic confinement fusion research

MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam