Outlier detection for mixed-type data: A novel approach

Efthymios Costa; Ioanna Papatsouma

arXiv:2308.09562·stat.ME·December 12, 2023

Outlier detection for mixed-type data: A novel approach

Efthymios Costa, Ioanna Papatsouma

PDF

Open Access

TL;DR

This paper introduces a new outlier detection method tailored for mixed-type data, effectively identifying anomalies with minimal false positives and reduced user intervention, applicable as a preprocessing step or integrated with other algorithms.

Contribution

The paper presents a novel outlier detection approach specifically designed for mixed discrete and continuous data, with guidelines for hyperparameter selection and demonstrated high performance.

Findings

01

High detection accuracy for outliers in mixed data

02

Minimal false positive rate

03

Applicable as preprocessing or with other algorithms

Abstract

Outlier detection can serve as an extremely important tool for researchers from a wide range of fields. From the sectors of banking and marketing to the social sciences and healthcare sectors, outlier detection techniques are very useful for identifying subjects that exhibit different and sometimes peculiar behaviours. When the data set available to the researcher consists of both discrete and continuous variables, outlier detection presents unprecedented challenges. In this paper we propose a novel method that detects outlying observations in settings of mixed-type data, while reducing the required user interaction and providing general guidelines for selecting suitable hyperparameter values. The methodology developed is being assessed through a series of simulations on data sets with varying characteristics and achieves very good performance levels. Our method demonstrates a high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Advanced Statistical Methods and Models · Artificial Immune Systems Applications