BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

Yunseung Lee; Subin Kim; Youngjun Kwak; Jaegul Choo

arXiv:2602.17072·cs.CL·February 27, 2026

BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

Yunseung Lee, Subin Kim, Youngjun Kwak, Jaegul Choo

PDF

Open Access

TL;DR

BankMathBench is a new domain-specific benchmark dataset designed to evaluate and improve large language models' numerical reasoning abilities in realistic banking scenarios, covering multi-step calculations and multi-condition reasoning.

Contribution

The paper introduces BankMathBench, a structured dataset with three difficulty levels, specifically targeting banking tasks, and demonstrates its effectiveness in enhancing LLMs' numerical reasoning through tool-augmented fine-tuning.

Findings

01

Significant accuracy improvements in LLMs after training on BankMathBench.

02

Models achieved up to 75.1 percentage points increase in intermediate difficulty tasks.

03

BankMathBench provides a reliable benchmark for real-world banking numerical reasoning.

Abstract

Large language models (LLMs)-based chatbots are increasingly being adopted in the financial domain, particularly in digital banking, to handle customer inquiries about products such as deposits, savings, and loans. However, these models still exhibit low accuracy in core banking computations-including total payout estimation, comparison of products with varying interest rates, and interest calculation under early repayment conditions. Such tasks require multi-step numerical reasoning and contextual understanding of banking products, yet existing LLMs often make systematic errors-misinterpreting product types, applying conditions incorrectly, or failing basic calculations involving exponents and geometric progressions. However, such errors have rarely been captured by existing benchmarks. Mathematical datasets focus on fundamental math problems, whereas financial benchmarks primarily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare