The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz
David Noever, Forrest McKee

TL;DR
This paper presents a new evaluation framework using an unsolvable dataset to test large language models' ability to admit ignorance, revealing current limitations and variability across problem types, with implications for AGI development.
Contribution
Introduces a novel dataset and evaluation method for assessing LLMs' uncertainty acknowledgment on unsolvable problems, advancing AGI assessment methodologies.
Findings
Models often fail to admit ignorance on unsolvable problems.
GPT-4 shows higher uncertainty acknowledgment on more difficult problems.
Performance varies significantly across different problem categories.
Abstract
This research introduces a novel evaluation framework designed to assess large language models' (LLMs) ability to acknowledge uncertainty on 675 fundamentally unsolvable problems. Using a curated dataset of graduate-level grand challenge questions with intentionally unknowable answers, we evaluated twelve state-of-the-art LLMs, including both open and closed-source models, on their propensity to admit ignorance rather than generate plausible but incorrect responses. The best models scored in 62-68% accuracy ranges for admitting the problem solution was unknown in fields ranging from biology to philosophy and mathematics. We observed an inverse relationship between problem difficulty and model accuracy, with GPT-4 demonstrating higher rates of uncertainty acknowledgment on more challenging problems (35.8%) compared to simpler ones (20.0%). This pattern indicates that models may be more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax
