FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence

Xinyu Yan; Boyang Chen; Jiaming Zhang; Tiantong Wu; Hong Xi Tae; Yichen He; Tiantong Wang; Yachun Mi; Yurong Hao; Yilei Zhao; Lei Xiao; Longtao Huang; Pengjun Xie; Wei Liu; Wei Yang Bryan Lim

arXiv:2605.08820·cs.CV·May 12, 2026

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence

Xinyu Yan, Boyang Chen, Jiaming Zhang, Tiantong Wu, Hong Xi Tae, Yichen He, Tiantong Wang, Yachun Mi, Yurong Hao, Yilei Zhao, Lei Xiao, Longtao Huang, Pengjun Xie, Wei Liu, Wei Yang Bryan Lim

PDF

TL;DR

FraudBench is a new multimodal benchmark designed to evaluate AI and human ability to detect AI-generated fraudulent refund evidence across various real-world scenarios.

Contribution

The paper introduces FraudBench, a comprehensive benchmark dataset for claim-conditioned detection of AI-generated fraudulent evidence, combining real and synthetic images with metadata.

Findings

01

Current models recognize real damage but struggle with fake damage detection.

02

Fake-damage detection rates are often below 50% on most generators.

03

Specialized detectors outperform generic models but still show inconsistency.

Abstract

Artificial Intelligence (AI)-generated images have become increasingly realistic and readily adaptable to concrete real-world claims, creating new challenges for verifying visual evidence. A concrete emerging risk is AI-generated refund fraud, in which manipulated or synthetic images are used to support claims about damaged products, poor delivery conditions, or service-related defects. Existing AI-generated image detection benchmarks mainly evaluate standalone authenticity classification, cross-generator transfer, or forensic localization, leaving claim-conditioned fraudulent evidence detection underexplored. To bridge this gap, we introduce FraudBench, a multimodal benchmark for detecting AI-generated fraudulent refund evidence. FraudBench is constructed from real-world user-review evidence across e-commerce, food delivery, and travel-service scenarios. We curate real evidence images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.