Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning
Boxiang Zhao, Qince Li, Zhonghao Wang, Yi Wang, Peng Cheng, Bo Lin

TL;DR
This paper introduces the Cognitive Complexity Benchmark (CCB) and a neuro-symbolic framework called Financial-PoT to improve the robustness of large language models in financial quantitative reasoning, addressing arithmetic hallucinations and reasoning failures.
Contribution
The paper presents a new evaluation benchmark (CCB) for diagnosing reasoning degradation and proposes the Financial-PoT framework with architectural decoupling to enhance reasoning accuracy in financial tasks.
Findings
Financial-PoT improves accuracy from 59.7% to 67.3%.
The approach achieves up to 10-fold gains in high-complexity reasoning tasks.
Architectural decoupling enhances reliability in financial reasoning.
Abstract
While Large Language Models excel at semantic tasks, they face a critical bottleneck in financial quantitative reasoning, frequently suffering from "Arithmetic Hallucinations" and a systemic failure mode we term "Cognitive Collapse". To strictly quantify this phenomenon, we introduce the Cognitive Complexity Benchmark (CCB), a robust evaluation framework grounded in a dataset constructed from 95 real-world Chinese A-share annual reports. Unlike traditional datasets, the CCB stratifies financial queries into a three-dimensional taxonomy, Data Source, Mapping Difficulty, and Result Unit, enabling the precise diagnosis of reasoning degradation in high-cognitive-load scenarios. To address these failures, we propose the Iterative Dual-Phase Financial-PoT framework. This neuro-symbolic architecture enforces a strict architectural decoupling: it first isolates semantic variable extraction and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Ferroelectric and Negative Capacitance Devices · Benford’s Law and Fraud Detection
