Evaluating Superhuman Models with Consistency Checks
Lukas Fluri, Daniel Paleka, Florian Tram\`er

TL;DR
This paper introduces a framework for evaluating superhuman models through consistency checks that identify logical errors, even when ground truth is unavailable, demonstrated on tasks like chess, forecasting, and legal judgments.
Contribution
The paper proposes a novel evaluation framework based on logical consistency checks to assess superhuman models without relying on ground truth.
Findings
Identified logical inconsistencies in superhuman decision-making
Demonstrated framework on chess, forecasting, and legal tasks
Showed models can be flawed despite high performance
Abstract
If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this paper, we propose a framework for evaluating superhuman models via consistency checks. Our premise is that while the correctness of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules. We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments. We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Statistical and Computational Modeling
MethodsMulti-Head Attention · Attention Is All You Need · fail · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Residual Connection
