ScienceWorld: Is your Agent Smarter than a 5th Grader?

Ruoyao Wang; Peter Jansen; Marc-Alexandre C\^ot\'e; Prithviraj; Ammanabrolu

arXiv:2203.07540·cs.CL·November 15, 2022

ScienceWorld: Is your Agent Smarter than a 5th Grader?

Ruoyao Wang, Peter Jansen, Marc-Alexandre C\^ot\'e, Prithviraj, Ammanabrolu

PDF

1 Repo

TL;DR

ScienceWorld introduces an interactive benchmark to evaluate scientific reasoning in agents, revealing that grounded, interactive training significantly improves reasoning abilities over large static models.

Contribution

The paper presents a new interactive environment benchmark and demonstrates that grounded training enhances scientific reasoning in AI agents beyond large pre-trained models.

Findings

01

Interactive training outperforms static models in reasoning tasks

02

Small agents trained interactively can surpass large models in scientific reasoning

03

Grounded environments are crucial for developing reasoning capabilities

Abstract

We present ScienceWorld, a benchmark to test agents' scientific reasoning abilities in a new interactive text environment at the level of a standard elementary school science curriculum. Despite the transformer-based progress seen in question-answering and scientific text processing, we find that current models cannot reason about or explain learned science concepts in novel contexts. For instance, models can easily answer what the conductivity of a known material is but struggle when asked how they would conduct an experiment in a grounded environment to find the conductivity of an unknown material. This begs the question of whether current models are simply retrieving answers by way of seeing a large number of similar examples or if they have learned to reason about concepts in a reusable manner. We hypothesize that agents need to be grounded in interactive environments to achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/scienceworld
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.