The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMs

Sergey Berezin; Reza Farahbakhsh; Noel Crespi

arXiv:2501.18626·cs.CR·June 3, 2025

The TIP of the Iceberg: Revealing a Hidden Class of Task-in-Prompt Adversarial Attacks on LLMs

Sergey Berezin, Reza Farahbakhsh, Noel Crespi

PDF

Open Access

TL;DR

This paper introduces Task-in-Prompt (TIP) attacks that embed sequence-to-sequence tasks into prompts to bypass safety measures in large language models, revealing significant vulnerabilities.

Contribution

It presents a new class of jailbreak attacks and a benchmark to evaluate their effectiveness against state-of-the-art LLMs.

Findings

01

TIP attacks bypass safeguards in GPT-4o and LLaMA 3.2

02

Reveals weaknesses in current LLM safety alignments

03

Highlights need for advanced defense strategies

Abstract

We present a novel class of jailbreak adversarial attacks on LLMs, termed Task-in-Prompt (TIP) attacks. Our approach embeds sequence-to-sequence tasks (e.g., cipher decoding, riddles, code execution) into the model's prompt to indirectly generate prohibited inputs. To systematically assess the effectiveness of these attacks, we introduce the PHRYGE benchmark. We demonstrate that our techniques successfully circumvent safeguards in six state-of-the-art language models, including GPT-4o and LLaMA 3.2. Our findings highlight critical weaknesses in current LLM safety alignments and underscore the urgent need for more sophisticated defence strategies. Warning: this paper contains examples of unethical inquiries used solely for research purposes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques

MethodsLLaMA