Math Natural Language Inference: this should be easy!

Valeria de Paiva; Qiyue Gao; Hai Hu; Pavel Kovalev; Yikang Liu; Lawrence S. Moss; Zhiheng Qian

arXiv:2507.23063·cs.CL·August 1, 2025

Math Natural Language Inference: this should be easy!

Valeria de Paiva, Qiyue Gao, Hai Hu, Pavel Kovalev, Yikang Liu, Lawrence S. Moss, Zhiheng Qian

PDF

Open Access 1 Video

TL;DR

This paper evaluates whether large language models can perform natural language inference on mathematical texts, revealing both their potential and current limitations in understanding complex mathematical language.

Contribution

It introduces a new Math NLI corpus with human-labeled and LLM-generated hypotheses, and assesses LLM performance and consistency in mathematical inference tasks.

Findings

01

LLMs can match human performance with majority voting in some cases

02

Models still struggle with basic mathematical inferences

03

Current models are less prone to hypothesis-only inference

Abstract

We ask whether contemporary LLMs are able to perform natural language inference (NLI) tasks on mathematical texts. We call this the Math NLI problem. We construct a corpus of Math NLI pairs whose premises are from extant mathematical text and whose hypotheses and gold labels were provided by people with experience in both research-level mathematics and also in the NLI field. We also investigate the quality of corpora using the same premises but whose hypotheses are provided by LLMs themselves. We not only investigate the performance but also the inter-group consistency of the diverse group of LLMs. We have both positive and negative findings. Among our positive findings: in some settings, using a majority vote of LLMs is approximately equivalent to using human-labeled data in the Math NLI area. On the negative side: LLMs still struggle with mathematical language. They occasionally fail…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Math Natural Language Inference: this should be easy!· underline

Taxonomy

TopicsMathematics, Computing, and Information Processing