From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation

Najrin Sultana; Md Rafi Ur Rashid; Kang Gu; Shagufta Mehnaz

arXiv:2511.03128·cs.LG·November 6, 2025

From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation

Najrin Sultana, Md Rafi Ur Rashid, Kang Gu, Shagufta Mehnaz

PDF

Open Access 1 Video

TL;DR

This paper introduces novel frameworks, StaDec and DyDec, for generating adaptive adversarial texts to evaluate and improve the robustness of large language models against subtle, natural-looking attacks.

Contribution

It presents innovative attack methods that adapt to LLMs and automate adversarial example generation without external heuristics.

Findings

01

Effective generation of natural-looking adversarial texts

02

High transferability of attacks across different LLMs

03

Systematic approach for LLM robustness assessment

Abstract

LLMs can provide substantial zero-shot performance on diverse tasks using a simple task prompt, eliminating the need for training or fine-tuning. However, when applying these models to sensitive tasks, it is crucial to thoroughly assess their robustness against adversarial inputs. In this work, we introduce Static Deceptor (StaDec) and Dynamic Deceptor (DyDec), two innovative attack frameworks designed to systematically generate dynamic and adaptive adversarial examples by leveraging the understanding of the LLMs. We produce subtle and natural-looking adversarial inputs that preserve semantic similarity to the original text while effectively deceiving the target LLM. By utilizing an automated, LLM-driven pipeline, we eliminate the dependence on external heuristics. Our attacks evolve with the advancements in LLMs and demonstrate strong transferability across models unknown to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection