PreScam: A Benchmark for Predicting Scam Progression from Early Conversations

Weixiang Sun; Shang Ma; Yiyang Li; Tianyi Ma; Zehong Wang; Colby Nelson; Xusheng Xiao; Yanfang Ye

arXiv:2605.12243·cs.CL·May 13, 2026

PreScam: A Benchmark for Predicting Scam Progression from Early Conversations

Weixiang Sun, Shang Ma, Yiyang Li, Tianyi Ma, Zehong Wang, Colby Nelson, Xusheng Xiao, Yanfang Ye

PDF

1 Datasets

TL;DR

PreScam is a comprehensive benchmark dataset designed to evaluate how well language models can understand and predict the progression of online conversational scams over multiple turns.

Contribution

The paper introduces PreScam, a large structured dataset of scam conversations with annotations, and benchmarks models on tasks predicting scam progression and actions.

Findings

01

Supervised encoders outperform zero-shot LLMs in termination prediction.

02

Next-action prediction is only moderately successful even for strong LLMs.

03

Current models struggle to fully capture scam escalation and manipulation cues.

Abstract

Conversational scams, such as romance and investment scams, are emerging as a major form of online fraud. Unlike one-shot scam lures such as fake lottery or unpaid toll messages, they unfold through multi-turn conversations in which scammers gradually manipulate victims using evolving psychological techniques. However, existing research mainly focuses on static scam detection or synthetic scams, leaving open whether language models can understand how real-world scams progress over time. We introduce PreScam, a benchmark for modeling scam progression from early conversations. Built from user-submitted scam reports, PreScam filters and structures 177,989 raw reports into 11,573 conversational scam instances spanning 20 scam categories. Each instance is hierarchically structured according to the scam lifecycle defined by the proposed scam kill chain, and further annotated at the turn level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

aoiandroid/papers
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.