Document-Level Numerical Reasoning across Single and Multiple Tables in Financial Reports
Yi-Cheng Wang, Wei-An Wang, Chu-Song Chen

TL;DR
This paper introduces FinLongDocQA, a dataset for numerical reasoning in long financial reports, and proposes FinLongDocAgent, a multi-agent retrieval-augmented method to improve question answering accuracy.
Contribution
The paper presents a new dataset for cross-table numerical reasoning in financial reports and a novel multi-agent RAG approach to enhance reasoning accuracy over long documents.
Findings
LLMs struggle with long financial reports exceeding 129k tokens.
Iterative retrieval and verification significantly improve numerical QA accuracy.
The proposed FinLongDocAgent outperforms baseline methods in experiments.
Abstract
Despite the strong language understanding abilities of large language models (LLMs), they still struggle with reliable question answering (QA) over long, structured documents, particularly for numerical reasoning. Financial annual reports exemplify this difficulty: financial statement analysis often hinges on accurate arithmetic, and analysts derive key indicators by integrating evidence scattered across multiple tables and narrative text. However, existing benchmarks focus largely on single-table settings, leaving cross-table document-level numerical reasoning underexplored. To address this gap, we introduce FinLongDocQA, a dataset for both single-table and cross-table financial numerical reasoning in long-context reports. Evaluating both closed-source and open-source LLMs on FinLongDocQA reveals two bottlenecks: (1) annual reports often exceed 129k tokens, exacerbating the context rot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
