SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection
Yue Zhao, Xiyang Hu, Cheng Cheng, Cong Wang, Changlin Wan, Wen Wang,, Jianing Yang, Haoping Bai, Zheng Li, Cao Xiao, Yunlong Wang, Zhi Qiao, Jimeng, Sun, Leman Akoglu

TL;DR
SUOD is a modular system designed to accelerate large-scale unsupervised heterogeneous outlier detection by optimizing data reduction, model approximation, and task load balancing, while maintaining accuracy.
Contribution
The paper introduces SUOD, a novel acceleration framework that significantly speeds up training and prediction in heterogeneous outlier detection without sacrificing performance.
Findings
Effective acceleration on 20+ benchmark datasets
Maintains high detection accuracy
Demonstrated success in real-world fraud detection case
Abstract
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection. Due to the lack of ground truth labels, practitioners often have to build a large number of unsupervised, heterogeneous models (i.e., different algorithms with varying hyperparameters) for further combination and analysis, rather than relying on a single model. How to accelerate the training and scoring on new-coming samples by outlyingness (referred as prediction throughout the paper) with a large number of unsupervised, heterogeneous OD models? In this study, we propose a modular acceleration system, called SUOD, to address it. The proposed system focuses on three complementary acceleration aspects (data reduction for high-dimensional data, approximation for costly models, and taskload…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Imbalanced Data Classification Techniques · Data-Driven Disease Surveillance
