Mathematical Capabilities of ChatGPT

Simon Frieder; Luca Pinchetti; Alexis Chevalier; Ryan-Rhys Griffiths,; Tommaso Salvatori; Thomas Lukasiewicz; Philipp Christian Petersen; Julius; Berner

arXiv:2301.13867·cs.LG·December 12, 2023·297 cites

Mathematical Capabilities of ChatGPT

Simon Frieder, Luca Pinchetti, Alexis Chevalier, Ryan-Rhys Griffiths,, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Julius, Berner

PDF

Open Access 2 Repos 1 Video

TL;DR

This study evaluates the mathematical reasoning capabilities of ChatGPT and GPT-4 on graduate-level datasets, revealing that while useful as a fact-finding tool, their overall performance is below that of graduate students.

Contribution

The paper introduces two new datasets, GHOSTS and miniGHOSTS, curated by mathematicians to assess language models on graduate-level mathematics and reasoning.

Findings

01

ChatGPT effectively queries mathematical facts and acts as a knowledge base.

02

GPT-4 performs well on undergraduate math but struggles with graduate-level problems.

03

Both models' performance is below that of an average graduate student.

Abstract

We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Mathematical Capabilities of ChatGPT· slideslive

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Explainable Artificial Intelligence (XAI)

MethodsBalanced Selection