An Optimization Model for Outlier Detection in Categorical Data

Zengyou He; Xiaofei Xu; Shengchun Deng

arXiv:cs/0503081·cs.DB·May 23, 2007·58 cites

An Optimization Model for Outlier Detection in Categorical Data

Zengyou He, Xiaofei Xu, Shengchun Deng

PDF

Open Access

TL;DR

This paper introduces an optimization-based approach for detecting outliers in categorical data, addressing a gap in existing methods primarily designed for numeric data, and demonstrates its effectiveness through experiments.

Contribution

It formulates outlier detection in categorical data as a novel optimization problem and proposes a local-search heuristic algorithm for efficient solution finding.

Findings

01

The proposed model outperforms existing methods on real datasets.

02

The heuristic algorithm efficiently detects outliers in large synthetic datasets.

03

Experimental results confirm the model's superiority in accuracy and efficiency.

Abstract

The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Detection of such outliers is important for many applications such as fraud detection and customer migration. Most existing methods are designed for numeric data. They will encounter problems with real-life applications that contain categorical data. In this paper, we formally define the problem of outlier detection in categorical data as an optimization problem from a global viewpoint. Moreover, we present a local-search heuristic based algorithm for efficiently finding feasible solutions. Experimental results on real datasets and large synthetic datasets demonstrate the superiority of our model and algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Rough Sets and Fuzzy Logic · Advanced Statistical Methods and Models