Detecting Outliers in Data with Correlated Measures

Yu-Hsuan Kuo; Zhenhui Li; Daniel Kifer

arXiv:1808.08640·cs.LG·August 28, 2018

Detecting Outliers in Data with Correlated Measures

Yu-Hsuan Kuo, Zhenhui Li, Daniel Kifer

PDF

TL;DR

This paper introduces a novel outlier detection method that leverages data correlations and robust regression to identify outliers effectively, outperforming existing techniques on real-world datasets.

Contribution

The paper presents a new outlier detection approach that explicitly models outliers during robust regression, improving detection accuracy and robustness over prior methods.

Findings

01

Outperforms existing outlier detection methods on real datasets

02

Effectively detects outliers caused by sensor faults or atypical events

03

Demonstrates robustness and generality across different data scenarios

Abstract

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.