Functional Homotopy: Smoothing Discrete Optimization via Continuous   Parameters for LLM Jailbreak Attacks

Zi Wang; Divyam Anshumaan; Ashish Hooda; Yudong Chen; Somesh Jha

arXiv:2410.04234·cs.LG·February 18, 2025

Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Zi Wang, Divyam Anshumaan, Ashish Hooda, Yudong Chen, Somesh Jha

PDF

Open Access

TL;DR

This paper introduces the functional homotopy method, a novel continuous optimization technique for language models, improving jailbreak attack success rates by 20-30% over existing methods.

Contribution

The study proposes the functional homotopy approach, leveraging functional duality and homotopy principles to optimize discrete inputs for LLM jailbreak attacks.

Findings

01

Achieved 20-30% higher success rates in jailbreak attacks.

02

Demonstrated effectiveness on Llama-2 and Llama-3 models.

03

Introduced a new continuous optimization framework for discrete input problems.

Abstract

Optimization methods are widely employed in deep learning to identify and mitigate undesired model responses. While gradient-based techniques have proven effective for image models, their application to language models is hindered by the discrete nature of the input space. This study introduces a novel optimization approach, termed the \emph{functional homotopy} method, which leverages the functional duality between model training and input generation. By constructing a series of easy-to-hard optimization problems, we iteratively solve these problems using principles derived from established homotopy methods. We apply this approach to jailbreak attack synthesis for large language models (LLMs), achieving a $20% - 30%$ improvement in success rate over existing methods in circumventing established safe open-source models such as Llama-2 and Llama-3.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques