The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI   Quiz

David Noever; Forrest McKee

arXiv:2411.14486·cs.CL·November 25, 2024

The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz

David Noever, Forrest McKee

PDF

Open Access

TL;DR

This paper presents a new evaluation framework using an unsolvable dataset to test large language models' ability to admit ignorance, revealing current limitations and variability across problem types, with implications for AGI development.

Contribution

Introduces a novel dataset and evaluation method for assessing LLMs' uncertainty acknowledgment on unsolvable problems, advancing AGI assessment methodologies.

Findings

01

Models often fail to admit ignorance on unsolvable problems.

02

GPT-4 shows higher uncertainty acknowledgment on more difficult problems.

03

Performance varies significantly across different problem categories.

Abstract

This research introduces a novel evaluation framework designed to assess large language models' (LLMs) ability to acknowledge uncertainty on 675 fundamentally unsolvable problems. Using a curated dataset of graduate-level grand challenge questions with intentionally unknowable answers, we evaluated twelve state-of-the-art LLMs, including both open and closed-source models, on their propensity to admit ignorance rather than generate plausible but incorrect responses. The best models scored in 62-68% accuracy ranges for admitting the problem solution was unknown in fields ranging from biology to philosophy and mathematics. We observed an inverse relationship between problem difficulty and model accuracy, with GPT-4 demonstrating higher rates of uncertainty acknowledgment on more challenging problems (35.8%) compared to simpler ones (20.0%). This pattern indicates that models may be more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI

MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax