AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions
Walig\'ora Witold

TL;DR
AnomaLLMy is a new method that detects anomalous tokens in black-box LLMs by analyzing low-confidence single-token predictions, improving model reliability and tokenizer development.
Contribution
It introduces a cost-effective approach for anomaly detection in black-box LLMs using low-confidence predictions, validated on GPT-4 token data.
Findings
Detected 413 major anomalies and 65 minor anomalies in GPT-4 tokens.
Achieved anomaly detection with only $24.39 in API credits.
Demonstrated effectiveness in improving LLM robustness.
Abstract
This paper introduces AnomaLLMy, a novel technique for the automatic detection of anomalous tokens in black-box Large Language Models (LLMs) with API-only access. Utilizing low-confidence single-token predictions as a cost-effective indicator, AnomaLLMy identifies irregularities in model behavior, addressing the issue of anomalous tokens degrading the quality and reliability of models. Validated on the cl100k_base dataset, the token set of GPT-4, AnomaLLMy detected 413 major and 65 minor anomalies, demonstrating the method's efficiency with just $24.39 spent in API credits. The insights from this research are expected to be beneficial for enhancing the robustness of and accuracy of LLMs, particularly in the development and assessment of tokenizers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic confinement fusion research
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam
