Distributed Deep Forest and its Application to Automatic Detection of Cash-out Fraud
Ya-Lin Zhang, Jun Zhou, Wenhao Zheng, Ji Feng, Longfei Li, Ziqi Liu,, Ming Li, Zhiqiang Zhang, Chaochao Chen, Xiaolong Li, Zhi-Hua Zhou, YUAN, (ALAN) QI

TL;DR
This paper presents a distributed deep forest framework optimized for large-scale tasks, demonstrating its effectiveness in detecting cash-out fraud with over 100 million samples, outperforming existing models and reducing economic losses.
Contribution
The authors develop a distributed deep forest model with several improvements, enabling it to handle extremely large-scale tasks like fraud detection efficiently.
Findings
Deep forest outperforms existing models in large-scale fraud detection.
The distributed implementation effectively handles over 100 million samples.
The model significantly reduces economic losses due to fraud.
Abstract
Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Data Stream Mining Techniques
