Proactive Hardening of LLM Defenses with HASTE
Henry Chen, Victor Aranda, Samarth Keshari, Ryan Heartfield, Nicole Nichols

TL;DR
HASTE is a systematic framework that proactively and reactively enhances LLM defenses by generating adaptive attack prompts, significantly improving prompt detection and hardening strategies against prompt-based attacks.
Contribution
The paper introduces HASTE, a modular framework for generating evasive prompts to improve prompt-based attack detection and defense in LLMs, with demonstrated effectiveness.
Findings
Reduces malicious prompt detection by approximately 64% with hard negative mining.
Optimizes prompt detection models with fewer iteration loops.
Supports both proactive stress-testing and reactive attack modeling.
Abstract
Prompt-based attack techniques are one of the primary challenges in securely deploying and protecting LLM-based AI systems. LLM inputs are an unbounded, unstructured space. Consequently, effectively defending against these attacks requires proactive hardening strategies capable of continuously generating adaptive attack vectors to optimize LLM defense at runtime. We present HASTE (Hard-negative Attack Sample Training Engine): a systematic framework that iteratively engineers highly evasive prompts, within a modular optimization process, to continuously enhance detection efficacy for prompt-based attack techniques. The framework is agnostic to synthetic data generation methods, and can be generalized to evaluate prompt-injection detection efficacy, with and without fuzzing, for any hard-negative or hard-positive iteration strategy. Experimental evaluation of HASTE shows that hard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Network Security and Intrusion Detection · Security and Verification in Computing
