BadLink: Combining Graph and Information-Theoretical Features for Online   Fraud Group Detection

Yikun Ban; Xin Liu; Tianyi Zhang; Ling Huang; Yitao Duan; Xue Liu; Wei; Xu

arXiv:1805.10053·cs.CR·June 26, 2018·1 cites

BadLink: Combining Graph and Information-Theoretical Features for Online Fraud Group Detection

Yikun Ban, Xin Liu, Tianyi Zhang, Ling Huang, Yitao Duan, Xue Liu, Wei, Xu

PDF

Open Access

TL;DR

BadLink is a scalable fraud detection framework that combines graph and information-theoretical features to identify online fraud groups effectively, outperforming existing solutions even against camouflaged traffic.

Contribution

It introduces a novel combination of graph and information-theoretical features into a scalable, extensible framework for online fraud group detection.

Findings

01

Achieves state-of-the-art detection accuracy

02

Effective against sophisticated camouflage traffic

03

Supports multimodal datasets with diverse data types

Abstract

Frauds severely hurt many kinds of Internet businesses. Group-based fraud detection is a popular methodology to catch fraudsters who unavoidably exhibit synchronized behaviors. We combine both graph-based features (e.g. cluster density) and information-theoretical features (e.g. probability for the similarity) of fraud groups into two intuitive metrics. Based on these metrics, we build an extensible fraud detection framework, BadLink, to support multimodal datasets with different data types and distributions in a scalable way. Experiments on real production workload, as well as extensive comparison with existing solutions demonstrate the state-of-the-art performance of BadLink, even with sophisticated camouflage traffic.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Data Stream Mining Techniques · Imbalanced Data Classification Techniques