Distinguishing Scams and Fraud with Ensemble Learning
Isha Chadalavada, Tianhui Huang, Jessica Staddon

TL;DR
This paper introduces an ensemble learning method using large language models to differentiate scam complaints from non-scam fraud in the CFPB database, aiming to improve scam detection accuracy.
Contribution
It presents a novel LLM ensemble approach specifically designed for distinguishing scam complaints from other fraud reports in a financial complaints dataset.
Findings
Ensemble LLMs outperform individual models in scam detection.
Identifies strengths and weaknesses of LLMs in scam classification.
Provides initial evaluation results on CFPB complaints data.
Abstract
Users increasingly query LLM-enabled web chatbots for help with scam defense. The Consumer Financial Protection Bureau's complaints database is a rich data source for evaluating LLM performance on user scam queries, but currently the corpus does not distinguish between scam and non-scam fraud. We developed an LLM ensemble approach to distinguishing scam and fraud CFPB complaints and describe initial findings regarding the strengths and weaknesses of LLMs in the scam defense context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
