Outlier Detection in High Dimensional Data
Firuz Kamalov, Ho Hon Leung

TL;DR
This paper introduces a new outlier detection method for high-dimensional data that combines PCA and kernel density estimation, effectively addressing the challenges of high feature spaces and small sample sizes.
Contribution
The paper presents a novel outlier detection algorithm that improves performance on high-dimensional datasets by leveraging PCA and kernel density estimation.
Findings
Outperforms benchmark methods in F1-score on synthetic and real data.
Achieves better-than-average execution times.
Effectively detects outliers in high-dimensional, small-sample datasets.
Abstract
High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by the -score. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
