LOSDD: Leave-Out Support Vector Data Description for Outlier Detection

Daniel Boiar; Thomas Liebig; Erich Schubert

arXiv:2212.13626·cs.LG·December 29, 2022

LOSDD: Leave-Out Support Vector Data Description for Outlier Detection

Daniel Boiar, Thomas Liebig, Erich Schubert

PDF

Open Access

TL;DR

LOSDD introduces a leave-out strategy for SVM-based outlier detection that improves accuracy on dirty data by iteratively identifying and removing outliers, reducing masking effects, and optimizing training efficiency.

Contribution

The paper proposes a novel leave-out approach for SVDD that enhances outlier detection in contaminated data and offers an efficient incremental training method.

Findings

01

Effective outlier detection in dirty data

02

Reduction of masking effects in outlier identification

03

Incremental training accelerates the leave-out SVM process

Abstract

Support Vector Machines have been successfully used for one-class classification (OCSVM, SVDD) when trained on clean data, but they work much worse on dirty data: outliers present in the training data tend to become support vectors, and are hence considered "normal". In this article, we improve the effectiveness to detect outliers in dirty training data with a leave-out strategy: by temporarily omitting one candidate at a time, this point can be judged using the remaining data only. We show that this is more effective at scoring the outlierness of points than using the slack term of existing SVM-based approaches. Identified outliers can then be removed from the data, such that outliers hidden by other outliers can be identified, to reduce the problem of masking. Naively, this approach would require training N individual SVMs (and training $O (N^{2})$ SVMs when iteratively removing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Imbalanced Data Classification Techniques · Water Systems and Optimization

MethodsSupport Vector Machine