FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks
Bocheng Chen, Hanqing Guo, Qiben Yan

TL;DR
This paper introduces a novel black-box defense method for large language models that dynamically adjusts decoding hyperparameters and prompts to effectively counter jailbreak attacks without requiring internal model access or additional training.
Contribution
It proposes a moving target defense approach that modifies decoding hyperparameters and prompts during runtime to improve robustness against jailbreak attacks in black-box LLMs.
Findings
Most effective against jailbreaks in tested models
Lower inference costs compared to other defenses
Maintains comparable response quality
Abstract
Defense in large language models (LLMs) is crucial to counter the numerous attackers exploiting these systems to generate harmful content through manipulated prompts, known as jailbreak attacks. Although many defense strategies have been proposed, they often require access to the model's internal structure or need additional training, which is impractical for service providers using LLM APIs, such as OpenAI APIs or Claude APIs. In this paper, we propose a moving target defense approach that alters decoding hyperparameters to enhance model robustness against various jailbreak attacks. Our approach does not require access to the model's internal structure and incurs no additional training costs. The proposed defense includes two key components: (1) optimizing the decoding strategy by identifying and adjusting decoding hyperparameters that influence token generation probabilities, and (2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Information and Cyber Security · Advanced Malware Detection Techniques
Methodstravel james
