A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal   Strategies Suffice?

Martin Q. Ma; Yue Zhao; Xiaorong Zhang; Leman Akoglu

arXiv:2104.01422·cs.LG·April 14, 2021·5 cites

A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice?

Martin Q. Ma, Yue Zhao, Xiaorong Zhang, Leman Akoglu

PDF

Open Access 1 Repo

TL;DR

This study investigates whether internal, label-free evaluation strategies can effectively select outlier detection models, revealing that current methods are insufficient and only comparable to random choices.

Contribution

The paper provides a comprehensive large-scale evaluation of internal model selection strategies for unsupervised outlier detection, highlighting their limitations.

Findings

01

None of the strategies outperform random selection.

02

Current internal strategies are only as good as a state-of-the-art detector.

03

The study introduces a large open testbed with diverse tasks and models.

Abstract

Given an unsupervised outlier detection task, how should one select a detection algorithm as well as its hyperparameters (jointly called a model)? Unsupervised model selection is notoriously difficult, in the absence of hold-out validation data with ground-truth labels. Therefore, the problem is vastly understudied. In this work, we study the feasibility of employing internal model evaluation strategies for selecting a model for outlier detection. These so-called internal strategies solely rely on the input data (without labels) and the output (outlier scores) of the candidate models. We setup (and open-source) a large testbed with 39 detection tasks and 297 candidate models comprised of 8 detectors and various hyperparameter configurations. We evaluate 7 different strategies on their ability to discriminate between models w.r.t. detection performance, without using any labels. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yzhao062/yzhao062
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Water Systems and Optimization · Data-Driven Disease Surveillance