Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee

Cedric Renggli; Luka Rimanic; Nora Hollenstein; Ce Zhang

arXiv:2108.13034·cs.LG·November 8, 2021·1 cites

Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee

Cedric Renggli, Luka Rimanic, Nora Hollenstein, Ce Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces FeeBee, a framework for evaluating Bayes error estimators on real-world datasets by injecting label noise and analyzing their performance across multiple noise levels, addressing practical applicability.

Contribution

FeeBee is the first systematic framework to assess BER estimators on real-world data, considering computational complexity, hyper-parameter sensitivity, and robustness through controlled label noise.

Findings

01

Analyzed 7 BER estimators across 6 datasets.

02

Identified strengths and weaknesses of each estimator.

03

Provided insights into estimator robustness and practicality.

Abstract

The Bayes error rate (BER) is a fundamental concept in machine learning that quantifies the best possible accuracy any classifier can achieve on a fixed probability distribution. Despite years of research on building estimators of lower and upper bounds for the BER, these were usually compared only on synthetic datasets with known probability distributions, leaving two key questions unanswered: (1) How well do they perform on real-world datasets?, and (2) How practical are they? Answering these is not trivial. Apart from the obvious challenge of an unknown BER for real-world datasets, there are two main aspects any BER estimator needs to overcome in order to be applicable in real-world settings: (1) the computational and sample complexity, and (2) the sensitivity and selection of hyper-parameters. In this work, we propose FeeBee, the first principled framework for analyzing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ds3lab/feebee
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning