Solving Quantitative Reasoning Problems with Language Models

Aitor Lewkowycz; Anders Andreassen; David Dohan; Ethan Dyer; Henryk; Michalewski; Vinay Ramasesh; Ambrose Slone; Cem Anil; Imanol Schlag; Theo; Gutman-Solo; Yuhuai Wu; Behnam Neyshabur; Guy Gur-Ari; Vedant Misra

arXiv:2206.14858·cs.CL·July 4, 2022·281 cites

Solving Quantitative Reasoning Problems with Language Models

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk, Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo, Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

PDF

Open Access 1 Repo 5 Datasets 2 Videos

TL;DR

This paper introduces Minerva, a large language model trained on technical content that significantly improves quantitative reasoning performance on scientific problems without external tools.

Contribution

The paper presents Minerva, a novel language model trained on technical data, achieving state-of-the-art results in scientific quantitative reasoning tasks.

Findings

01

Achieves state-of-the-art performance on technical benchmarks

02

Correctly answers nearly one-third of undergraduate science problems

03

Performs well without external tools

Abstract

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gair-nlp/abel
pytorch

Datasets

Videos

Is Google’s New AI As Smart As A Human? 🤖· youtube

Solving Quantitative Reasoning Problems with Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAdam · 1-bit Adam