Math Natural Language Inference: this should be easy!
Valeria de Paiva, Qiyue Gao, Hai Hu, Pavel Kovalev, Yikang Liu, Lawrence S. Moss, Zhiheng Qian

TL;DR
This paper evaluates whether large language models can perform natural language inference on mathematical texts, revealing both their potential and current limitations in understanding complex mathematical language.
Contribution
It introduces a new Math NLI corpus with human-labeled and LLM-generated hypotheses, and assesses LLM performance and consistency in mathematical inference tasks.
Findings
LLMs can match human performance with majority voting in some cases
Models still struggle with basic mathematical inferences
Current models are less prone to hypothesis-only inference
Abstract
We ask whether contemporary LLMs are able to perform natural language inference (NLI) tasks on mathematical texts. We call this the Math NLI problem. We construct a corpus of Math NLI pairs whose premises are from extant mathematical text and whose hypotheses and gold labels were provided by people with experience in both research-level mathematics and also in the NLI field. We also investigate the quality of corpora using the same premises but whose hypotheses are provided by LLMs themselves. We not only investigate the performance but also the inter-group consistency of the diverse group of LLMs. We have both positive and negative findings. Among our positive findings: in some settings, using a majority vote of LLMs is approximately equivalent to using human-labeled data in the Math NLI area. On the negative side: LLMs still struggle with mathematical language. They occasionally fail…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMathematics, Computing, and Information Processing
