Outlier Detection in High Dimensional Data

Firuz Kamalov; Ho Hon Leung

arXiv:1909.03681·cs.LG·September 22, 2020

Outlier Detection in High Dimensional Data

Firuz Kamalov, Ho Hon Leung

PDF

TL;DR

This paper introduces a new outlier detection method for high-dimensional data that combines PCA and kernel density estimation, effectively addressing the challenges of high feature spaces and small sample sizes.

Contribution

The paper presents a novel outlier detection algorithm that improves performance on high-dimensional datasets by leveraging PCA and kernel density estimation.

Findings

01

Outperforms benchmark methods in F1-score on synthetic and real data.

02

Achieves better-than-average execution times.

03

Effectively detects outliers in high-dimensional, small-sample datasets.

Abstract

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by the $F_{1}$ -score. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.