Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks
Zi Wang, Divyam Anshumaan, Ashish Hooda, Yudong Chen, Somesh Jha

TL;DR
This paper introduces the functional homotopy method, a novel continuous optimization technique for language models, improving jailbreak attack success rates by 20-30% over existing methods.
Contribution
The study proposes the functional homotopy approach, leveraging functional duality and homotopy principles to optimize discrete inputs for LLM jailbreak attacks.
Findings
Achieved 20-30% higher success rates in jailbreak attacks.
Demonstrated effectiveness on Llama-2 and Llama-3 models.
Introduced a new continuous optimization framework for discrete input problems.
Abstract
Optimization methods are widely employed in deep learning to identify and mitigate undesired model responses. While gradient-based techniques have proven effective for image models, their application to language models is hindered by the discrete nature of the input space. This study introduces a novel optimization approach, termed the \emph{functional homotopy} method, which leverages the functional duality between model training and input generation. By constructing a series of easy-to-hard optimization problems, we iteratively solve these problems using principles derived from established homotopy methods. We apply this approach to jailbreak attack synthesis for large language models (LLMs), achieving a improvement in success rate over existing methods in circumventing established safe open-source models such as Llama-2 and Llama-3.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
