Improved LLM Agents for Financial Document Question Answering

Nelvin Tan; Zian Seng; Liang Zhang; Yu-Ching Shih; Dong Yang; Amol Salunkhe

arXiv:2506.08726·cs.CL·January 8, 2026

Improved LLM Agents for Financial Document Question Answering

Nelvin Tan, Zian Seng, Liang Zhang, Yu-Ching Shih, Dong Yang, Amol Salunkhe

PDF

TL;DR

This paper introduces an improved critic agent for financial document question answering with LLMs, outperforming previous methods and functioning effectively without oracle labels, enhancing safety and interaction dynamics.

Contribution

It presents a novel improved critic agent and a calculator agent that surpass previous state-of-the-art methods, especially in scenarios lacking oracle labels.

Findings

01

The improved critic agent outperforms traditional critic agents without oracle labels.

02

The calculator agent surpasses the previous state-of-the-art approach.

03

Agent interactions significantly influence overall performance.

Abstract

Large language models (LLMs) have shown impressive capabilities on numerous natural language processing tasks. However, LLMs still struggle with numerical question answering for financial documents that include tabular and textual data. Recent works have showed the effectiveness of critic agents (i.e., self-correction) for this task given oracle labels. Building upon this framework, this paper examines the effectiveness of the traditional critic agent when oracle labels are not available, and show, through experiments, that this critic agent's performance deteriorates in this scenario. With this in mind, we present an improved critic agent, along with the calculator agent which outperforms the previous state-of-the-art approach (program-of-thought) and is safer. Furthermore, we investigate how our agents interact with each other, and how this interaction affects their performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.