TL;DR
FlashRT is a novel framework that significantly enhances the efficiency of optimization-based red-teaming attacks on long-context large language models, enabling scalable security evaluation.
Contribution
It introduces the first efficient framework for resource-conscious red-teaming of long-context LLMs, reducing computation time and GPU memory usage substantially.
Findings
FlashRT achieves 2x-7x speedup over baseline methods.
It reduces GPU memory consumption by 2x-4x for long contexts.
FlashRT is applicable to black-box optimization methods like TAP and AutoDAN.
Abstract
Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
