The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs
Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang, Siying Hu

TL;DR
This paper investigates the risks of multi-turn conversational scams in large language models, revealing escalation patterns, defense strategies, and failure modes through systematic simulation and analysis.
Contribution
It introduces a controlled LLM-to-LLM simulation framework to study multi-turn scam interactions and identifies key patterns and failure points in LLM safety mechanisms.
Findings
Scam interactions follow recurrent escalation patterns.
Defenses use verification and delay mechanisms.
Failures often due to safety guardrail activation and role instability.
Abstract
As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that single-turn safety evaluations fail to capture. We systematically study these risks using a controlled LLM-to-LLM simulation framework across multi-turn scam scenarios. Evaluating eight state-of-the-art models in English and Chinese, we analyze dialogue outcomes and qualitatively annotate attacker strategies, defensive responses, and failure modes. Results reveal that scam interactions follow recurrent escalation patterns, while defenses employ verification and delay mechanisms. Furthermore, interactional failures frequently stem from safety guardrail activation and role instability. Our findings highlight multi-turn interactional safety as a critical, distinct dimension of LLM behavior.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Stalking, Cyberstalking, and Harassment · Advanced Malware Detection Techniques
