NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks
Swaroop Mishra, Arindam Mitra, Neeraj Varshney, Bhavdeep Sachdeva,, Peter Clark, Chitta Baral, Ashwin Kalyan

TL;DR
NumGLUE is a comprehensive benchmark designed to evaluate AI systems' ability to perform basic arithmetic reasoning across multiple tasks, revealing current models' limitations and promoting knowledge sharing for improved performance.
Contribution
This paper introduces NumGLUE, a multi-task benchmark for simple arithmetic reasoning, highlighting the challenges faced by neural models and demonstrating benefits of joint training across tasks.
Findings
Neural models perform significantly worse than humans on NumGLUE.
Joint training across tasks improves performance by an average of 3.4%.
State-of-the-art models are 46.4% below human performance.
Abstract
Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems. While many datasets and models have been developed to this end, state-of-the-art AI systems are brittle; failing to perform the underlying mathematical reasoning when they appear in a slightly different scenario. Drawing inspiration from GLUE that was proposed in the context of natural language understanding, we propose NumGLUE, a multi-task benchmark that evaluates the performance of AI systems on eight different tasks, that at their core require simple arithmetic understanding. We show that this benchmark is far from being solved with neural models including state-of-the-art large-scale language models performing significantly worse than humans (lower by 46.4%). Further, NumGLUE promotes sharing knowledge across tasks, especially those with limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
