Humanly Certifying Superhuman Classifiers
Qiongkai Xu, Christian Walder, Chenchen Xu

TL;DR
This paper introduces a theoretical framework to evaluate and certify superhuman classifier performance relative to an unobserved oracle, using imperfect human annotations, and validates it through experiments and NLP model analysis.
Contribution
The paper develops a novel theory for estimating classifier accuracy against an unobserved oracle using only human annotations, enabling certification of superhuman performance.
Findings
The bounds converge reliably in toy experiments with known oracles.
The theory effectively identifies superhuman models in NLP tasks.
Meta-analysis suggests several recent models surpass human performance with high probability.
Abstract
Estimating the performance of a machine learning system is a longstanding challenge in artificial intelligence research. Today, this challenge is especially relevant given the emergence of systems which appear to increasingly outperform human beings. In some cases, this "superhuman" performance is readily demonstrated; for example by defeating legendary human players in traditional two player games. On the other hand, it can be challenging to evaluate classification models that potentially surpass human performance. Indeed, human annotations are often treated as a ground truth, which implicitly assumes the superiority of the human over any models trained on human annotations. In reality, human annotators can make mistakes and be subjective. Evaluating the performance with respect to a genuine oracle may be more objective and reliable, even when querying the oracle is expensive or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
