Non-Halting Queries: Exploiting Fixed Points in LLMs
Ghaith Hammouri, Kemal Derya, Berk Sunar

TL;DR
This paper uncovers a vulnerability in large language models where certain crafted queries can cause the models to never halt, revealing a significant security concern and exposing weaknesses in model alignment.
Contribution
The paper introduces a novel non-halting vulnerability in LLMs, analyzes its conditions, and develops practical prompts and inversion techniques to reliably induce non-halting behavior.
Findings
Non-halting queries can be reliably induced in various models.
A simple prompt recipe causes high success rates across models.
Non-halting behavior is prevalent and easily triggered with few tokens.
Abstract
We introduce a new vulnerability that exploits fixed points in autoregressive models and use it to craft queries that never halt. More precisely, for non-halting queries, the LLM never samples the end-of-string token <eos>. We rigorously analyze the conditions under which the non-halting anomaly presents itself. In particular, at temperature zero, we prove that if a repeating (cyclic) token sequence is observed at the output beyond the context size, then the LLM does not halt. We demonstrate non-halting queries in many experiments performed in base unaligned models where repeating prompts immediately lead to a non-halting cyclic behavior as predicted by the analysis. Further, we develop a simple recipe that takes the same fixed points observed in the base model and creates a prompt structure to target aligned models. We demonstrate the recipe's success in sending every major model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies
MethodsBalanced Selection
