How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New   Reasoning Challenge for AI

Ashwin Kalyan; Abhinav Kumar; Arjun Chandrasekaran; Ashish Sabharwal,; Peter Clark

arXiv:2110.14207·cs.CL·December 22, 2021

How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI

Ashwin Kalyan, Abhinav Kumar, Arjun Chandrasekaran, Ashish Sabharwal,, Peter Clark

PDF

1 Repo

TL;DR

This paper introduces Fermi Problems as a new reasoning challenge for AI, providing datasets of real and synthetic questions with solutions, revealing current models' struggles with approximate estimation tasks.

Contribution

It presents the first datasets and benchmarks for Fermi Problems, aiming to advance AI's reasoning abilities through a novel challenge.

Findings

01

Large language models perform poorly on Fermi Problems, with estimates off by two orders of magnitude.

02

The datasets include detailed solutions with executable programs and supporting facts.

03

Fermi Problems pose a significant challenge for current AI systems, highlighting areas for future research.

Abstract

Many real-world problems require the combined application of multiple reasoning abilities employing suitable abstractions, commonsense knowledge, and creative synthesis of problem-solving strategies. To help advance AI systems towards such capabilities, we propose a new reasoning challenge, namely Fermi Problems (FPs), which are questions whose answers can only be approximately estimated because their precise computation is either impractical or impossible. For example, "How much would the sea level rise if all ice in the world melted?" FPs are commonly used in quizzes and interviews to bring out and evaluate the creative reasoning abilities of humans. To do the same for AI systems, we present two datasets: 1) A collection of 1k real-world FPs sourced from quizzes and olympiads; and 2) a bank of 10k synthetic FPs of intermediate complexity to serve as a sandbox for the harder real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/fermi
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.