What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
Sangyeop Kim, Yohan Lee, Yongwoo Song, Kimin Lee

TL;DR
This paper empirically studies long-context vulnerabilities in LLMs, revealing that increased context length significantly impacts attack success, often without carefully crafted harmful content, exposing safety limitations in current models.
Contribution
It provides a comprehensive analysis of long-context attack vulnerabilities in LLMs, highlighting the primary role of context length and the ineffectiveness of current safety measures at large context scales.
Findings
Longer contexts increase attack success rates.
Repetitive or dummy text can bypass safety measures.
Safety behavior becomes inconsistent with increased context length.
Abstract
We investigate long-context vulnerabilities in Large Language Models (LLMs) through Many-Shot Jailbreaking (MSJ). Our experiments utilize context length of up to 128K tokens. Through comprehensive analysis with various many-shot attack settings with different instruction styles, shot density, topic, and format, we reveal that context length is the primary factor determining attack effectiveness. Critically, we find that successful attacks do not require carefully crafted harmful content. Even repetitive shots or random dummy text can circumvent model safety measures, suggesting fundamental limitations in long-context processing capabilities of LLMs. The safety behavior of well-aligned models becomes increasingly inconsistent with longer contexts. These findings highlight significant safety gaps in context expansion capabilities of LLMs, emphasizing the need for new safety mechanisms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsInformation and Cyber Security · Privacy-Preserving Technologies in Data
