A Unified Subspace Outlier Ensemble Framework for Outlier Detection in High Dimensional Spaces
Zengyou He, Xiaofei Xu, Shengchun Deng

TL;DR
This paper introduces a unified ensemble framework for high-dimensional outlier detection, demonstrating that existing methods are special cases, and presents a fast, simple algorithm called SOE1 that performs well on real and synthetic data.
Contribution
The paper proposes a unified ensemble framework for high-dimensional outlier detection and introduces the SOE1 algorithm, which is simple, fast, and effective.
Findings
SOE1 achieves comparable accuracy to state-of-the-art methods.
SOE1 is significantly faster, requiring only two dataset scans.
The unified framework encompasses existing outlier detection approaches.
Abstract
The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Detection of such outliers is important for many applications such as fraud detection and customer migration. Most such applications are high dimensional domains in which the data may contain hundreds of dimensions. However, the outlier detection problem itself is not well defined and none of the existing definitions are widely accepted, especially in high dimensional space. In this paper, our first contribution is to propose a unified framework for outlier detection in high dimensional spaces from an ensemble-learning viewpoint. In our new framework, the outlying-ness of each data object is measured by fusing outlier factors in different subspaces using a combination function. Accordingly, we show that all existing researches on outlier detection can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Imbalanced Data Classification Techniques · Artificial Immune Systems Applications
