Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Yik Siu Chan; Narutatsu Ri; Yuxin Xiao; Marzyeh Ghassemi

arXiv:2502.04322·cs.LG·August 5, 2025

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Yik Siu Chan, Narutatsu Ri, Yuxin Xiao, Marzyeh Ghassemi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper shows that simple, multi-step, multilingual interactions can effectively jailbreak large language models, enabling harmful actions, and introduces HarmScore and Speak Easy to measure and exploit this vulnerability.

Contribution

It reveals a new vulnerability in LLM safety, demonstrating that common interactions can be exploited for harm and proposing new metrics and attack frameworks.

Findings

01

Increased attack success rate and HarmScore with Speak Easy framework

02

Simple interactions can effectively elicit harmful responses from LLMs

03

Vulnerability exists across open-source and proprietary models

Abstract

Despite extensive safety alignment efforts, large language models (LLMs) remain vulnerable to jailbreak attacks that elicit harmful behavior. While existing studies predominantly focus on attack methods that require technical expertise, two critical questions remain underexplored: (1) Are jailbroken responses truly useful in enabling average users to carry out harmful actions? (2) Do safety vulnerabilities exist in more common, simple human-LLM interactions? In this paper, we demonstrate that LLM responses most effectively facilitate harmful actions when they are both actionable and informative--two attributes easily elicited in multi-step, multilingual interactions. Using this insight, we propose HarmScore, a jailbreak metric that measures how effectively an LLM response enables harmful actions, and Speak Easy, a simple multi-step, multilingual attack framework. Notably, by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yiksiu-chan/SpeakEasy
pytorchOfficial

Videos

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions· slideslive

Taxonomy

TopicsLaw, Economics, and Judicial Systems · Artificial Intelligence in Law · Law, AI, and Intellectual Property

MethodsFocus