SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning

Peidong Wang; Zhiming Ma; Xin Dai; Yongkang Liu; Shi Feng; Xiaocui Yang; Wenxing Hu; Zhihao Wang; Mingjun Pan; Li Yuan; and Daling Wang

arXiv:2601.01392·cs.SD·January 6, 2026

SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning

Peidong Wang, Zhiming Ma, Xin Dai, Yongkang Liu, Shi Feng, Xiaocui Yang, Wenxing Hu, Zhihao Wang, Mingjun Pan, Li Yuan, and Daling Wang

PDF

Open Access

TL;DR

SAFE-QAQ is an end-to-end audio-based fraud detection framework that improves accuracy and efficiency by eliminating transcription errors and leveraging hierarchical reasoning with reinforcement learning.

Contribution

The paper introduces SAFE-QAQ, a novel end-to-end system that uses reinforcement learning and hierarchical reasoning for real-time audio fraud detection without relying on transcriptions.

Findings

01

Significantly outperforms existing methods in accuracy and efficiency

02

Enables real-time fraud detection during live calls

03

Reduces human workload and financial losses

Abstract

Existing fraud detection methods predominantly rely on transcribed text, suffering from ASR errors and missing crucial acoustic cues like vocal tone and environmental context. This limits their effectiveness against complex deceptive strategies. To address these challenges, we first propose \textbf{SAFE-QAQ}, an end-to-end comprehensive framework for audio-based slow-thinking fraud detection. First, the SAFE-QAQ framework eliminates the impact of transcription errors on detection performance. Secondly, we propose rule-based slow-thinking reward mechanisms that systematically guide the system to identify fraud-indicative patterns by accurately capturing fine-grained audio details, through hierarchical reasoning processes. Besides, our framework introduces a dynamic risk assessment framework during live calls, enabling early detection and prevention of fraud. Experiments on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Imbalanced Data Classification Techniques · Speech Recognition and Synthesis