Anomaly Detection: How to Artificially Increase your F1-Score with a   Biased Evaluation Protocol

Damien Fourure; Muhammad Usama Javaid; Nicolas Posocco; Simon Tihon

arXiv:2106.16020·cs.LG·July 1, 2021

Anomaly Detection: How to Artificially Increase your F1-Score with a Biased Evaluation Protocol

Damien Fourure, Muhammad Usama Javaid, Nicolas Posocco, Simon Tihon

PDF

1 Repo

TL;DR

This paper demonstrates that F1-score and AVPR are unreliable metrics for anomaly detection evaluation due to their sensitivity to contamination rates and dataset differences, advocating for more robust evaluation protocols like AUC.

Contribution

The paper reveals the bias introduced by certain evaluation protocols in anomaly detection and proposes a more robust, standardized evaluation procedure using metrics like AUC.

Findings

01

F1-score and AVPR are highly sensitive to contamination rates.

02

Artificially modifying train-test splits can inflate performance metrics.

03

F1-score and AVPR are not suitable for comparing different datasets.

Abstract

Anomaly detection is a widely explored domain in machine learning. Many models are proposed in the literature, and compared through different metrics measured on various datasets. The most popular metrics used to compare performances are F1-score, AUC and AVPR. In this paper, we show that F1-score and AVPR are highly sensitive to the contamination rate. One consequence is that it is possible to artificially increase their values by modifying the train-test split procedure. This leads to misleading comparisons between algorithms in the literature, especially when the evaluation protocol is not well detailed. Moreover, we show that the F1-score and the AVPR cannot be used to compare performances on different datasets as they do not reflect the intrinsic difficulty of modeling such data. Based on these observations, we claim that F1-score and AVPR should not be used as metrics for anomaly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

euranova/F1-Score-is-Biased
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.