Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data
Mubashara Akhtar, Abhilash Shankarampeta, Vivek Gupta, Arpit Patil,, Oana Cocarascu, Elena Simperl

TL;DR
This paper provides a detailed taxonomy and evaluation of numerical reasoning skills in language models, revealing their strengths and weaknesses across various reasoning types, especially in tabular data tasks.
Contribution
It introduces a hierarchical taxonomy of numerical reasoning skills, develops diverse probes, and evaluates state-of-the-art models on tabular inference, highlighting their limitations and strengths.
Findings
No model excels across all reasoning types.
FlanT5 and GPT-3.5 show strong overall reasoning skills.
Models often exploit dataset artifacts for predictions.
Abstract
Numbers are crucial for various real-world domains such as finance, economics, and science. Thus, understanding and reasoning with numbers are essential skills for language models to solve different tasks. While different numerical benchmarks have been introduced in recent years, they are limited to specific numerical aspects mostly. In this paper, we propose a hierarchical taxonomy for numerical reasoning skills with more than ten reasoning types across four levels: representation, number sense, manipulation, and complex reasoning. We conduct a comprehensive evaluation of state-of-the-art models to identify reasoning challenges specific to them. Henceforth, we develop a diverse set of numerical probes employing a semi-automated approach. We focus on the tabular Natural Language Inference (TNLI) task as a case study and measure models' performance shifts. Our results show that no model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
MethodsSparse Evolutionary Training · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Dropout · Weight Decay
