Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Nivya Talokar; Ayush K Tarun; Murari Mandal; Maksym Andriushchenko; Antoine Bosselut

arXiv:2602.16346·cs.CL·May 19, 2026

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Nivya Talokar, Ayush K Tarun, Murari Mandal, Maksym Andriushchenko, Antoine Bosselut

PDF

TL;DR

This paper introduces STING, an automated framework for multi-turn, multilingual red-teaming of LLM agents to measure illicit assistance, revealing higher success rates than single-turn methods and highlighting language resource effects.

Contribution

The paper presents STING, a novel multi-turn, adaptive red-teaming framework for evaluating illicit task execution in multilingual LLM agents, filling a gap in existing benchmarks.

Findings

01

STING achieves higher illicit-task completion than single-turn prompts.

02

Illicit task success varies across languages and resource levels.

03

Multilingual evaluation shows non-English performance does not always improve with resources.

Abstract

LLM-based agents execute real-world workflows via tools and memory. These affordances enable ill-intended adversaries to also use these agents to carry out complex misuse scenarios. Existing agent misuse benchmarks largely test single-prompt instructions, leaving a gap in measuring how agents end up helping with harmful or illegal tasks over multiple turns. We introduce STING (Sequential Testing of Illicit N-step Goal execution), an automated red-teaming framework that constructs a step-by-step illicit plan grounded in a benign persona and iteratively probes a target agent with adaptive follow-ups, using judge agents to track phase completion. We further introduce an analysis framework that models multi-turn red-teaming as a time-to-first-jailbreak random variable, enabling analysis tools like discovery curves, hazard-ratio attribution by attack language, and a new metric: Restricted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Multi-Agent Systems and Negotiation