Comparison of Outlier Detection Techniques for Structured Data

Amulya Agarwal; Nitin Gupta

arXiv:2106.08779·cs.LG·June 17, 2021·6 cites

Comparison of Outlier Detection Techniques for Structured Data

Amulya Agarwal, Nitin Gupta

PDF

Open Access

TL;DR

This paper compares various outlier detection techniques for structured data to assist data scientists in selecting appropriate algorithms for improved machine learning model performance.

Contribution

It provides a comparative analysis of existing outlier detection methods, highlighting their strengths and use cases for structured data.

Findings

01

Certain techniques outperform others in specific data scenarios

02

Outlier detection significantly improves model accuracy when outliers are removed

03

The paper offers guidance for choosing suitable outlier detection methods

Abstract

An outlier is an observation or a data point that is far from rest of the data points in a given dataset or we can be said that an outlier is away from the center of mass of observations. Presence of outliers can skew statistical measures and data distributions which can lead to misleading representation of the underlying data and relationships. It is seen that the removal of outliers from the training dataset before modeling can give better predictions. With the advancement of machine learning, the outlier detection models are also advancing at a good pace. The goal of this work is to highlight and compare some of the existing outlier detection techniques for the data scientists to use that information for outlier algorithm selection while building a machine learning model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Advanced Statistical Methods and Models