Quantifying Uncertainty in Error Consistency: Towards Reliable Behavioral Comparison of Classifiers
Thomas Klein, Sascha Meyen, Wieland Brendel, Felix A. Wichmann, Kristof Meding

TL;DR
This paper enhances the measurement of error consistency (EC) in ML model benchmarking by introducing confidence intervals via bootstrapping and a model relating EC to response copying, enabling more reliable behavioral comparisons.
Contribution
It proposes a method to compute confidence intervals for EC and introduces a model linking EC to response copying, improving the reliability of behavioral benchmarking in ML.
Findings
Many reported differences between deep vision models and humans are statistically insignificant.
The new methodology allows for more conclusive and reliable behavioral comparisons.
Researchers can now design experiments with sufficient power to detect true behavioral differences.
Abstract
Benchmarking models is a key factor for the rapid progress in machine learning (ML) research. Thus, further progress depends on improving benchmarking metrics. A standard metric to measure the behavioral alignment between ML models and human observers is error consistency (EC). EC allows for more fine-grained comparisons of behavior than other metrics such as accuracy, and has been used in the influential Brain-Score benchmark to rank different DNNs by their behavioral consistency with humans. Previously, EC values have been reported without confidence intervals. However, empirically measured EC values are typically noisy -- thus, without confidence intervals, valid benchmarking conclusions are problematic. Here we improve on standard EC in two ways: First, we show how to obtain confidence intervals for EC using a bootstrapping technique, allowing us to derive significance tests for EC.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Explainable Artificial Intelligence (XAI) · Neural and Behavioral Psychology Studies
