Homophily Outlier Detection in Non-IID Categorical Data
Guansong Pang, Longbing Cao, Ling Chen

TL;DR
This paper presents a novel outlier detection framework for non-IID categorical data that captures interdependent outlier factors using a graph-based approach, significantly improving detection accuracy over existing methods.
Contribution
It introduces a distribution-sensitive outlier detection framework that models non-IID dependencies via value graphs and outlierness propagation, enhancing detection in complex data.
Findings
Outperforms five state-of-the-art methods with 10%-28% AUC improvement.
Significantly better feature selection for outlier detection.
Effective in high-dimensional, noisy, and complex datasets.
Abstract
Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distributions (non-IID). This may lead to the failure of detecting important outliers that are too subtle to be identified without considering the non-IID nature. The issue is even intensified in more challenging contexts, e.g., high-dimensional data with many noisy features. This work introduces a novel outlier detection framework and its two instances to identify outliers in categorical data by capturing non-IID outlier factors. Our approach first defines and incorporates distribution-sensitive outlier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Imbalanced Data Classification Techniques · Network Security and Intrusion Detection
MethodsFeature Selection
