The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Xiangzhe Yuan; Zhenhao Zhang; Haoming Tang; Siying Hu

arXiv:2601.03134·cs.CL·January 7, 2026

The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang, Siying Hu

PDF

Open Access

TL;DR

This paper investigates the risks of multi-turn conversational scams in large language models, revealing escalation patterns, defense strategies, and failure modes through systematic simulation and analysis.

Contribution

It introduces a controlled LLM-to-LLM simulation framework to study multi-turn scam interactions and identifies key patterns and failure points in LLM safety mechanisms.

Findings

01

Scam interactions follow recurrent escalation patterns.

02

Defenses use verification and delay mechanisms.

03

Failures often due to safety guardrail activation and role instability.

Abstract

As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that single-turn safety evaluations fail to capture. We systematically study these risks using a controlled LLM-to-LLM simulation framework across multi-turn scam scenarios. Evaluating eight state-of-the-art models in English and Chinese, we analyze dialogue outcomes and qualitatively annotate attacker strategies, defensive responses, and failure modes. Results reveal that scam interactions follow recurrent escalation patterns, while defenses employ verification and delay mechanisms. Furthermore, interactional failures frequently stem from safety guardrail activation and role instability. Our findings highlight multi-turn interactional safety as a critical, distinct dimension of LLM behavior.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Stalking, Cyberstalking, and Harassment · Advanced Malware Detection Techniques