Evaluating Superhuman Models with Consistency Checks

Lukas Fluri; Daniel Paleka; Florian Tram\`er

arXiv:2306.09983·cs.LG·October 20, 2023·1 cites

Evaluating Superhuman Models with Consistency Checks

Lukas Fluri, Daniel Paleka, Florian Tram\`er

PDF

Open Access 2 Repos

TL;DR

This paper introduces a framework for evaluating superhuman models through consistency checks that identify logical errors, even when ground truth is unavailable, demonstrated on tasks like chess, forecasting, and legal judgments.

Contribution

The paper proposes a novel evaluation framework based on logical consistency checks to assess superhuman models without relying on ground truth.

Findings

01

Identified logical inconsistencies in superhuman decision-making

02

Demonstrated framework on chess, forecasting, and legal tasks

03

Showed models can be flawed despite high performance

Abstract

If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this paper, we propose a framework for evaluating superhuman models via consistency checks. Our premise is that while the correctness of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules. We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments. We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Statistical and Computational Modeling

MethodsMulti-Head Attention · Attention Is All You Need · fail · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Residual Connection