Finance Agent Benchmark: Benchmarking LLMs on Real-world Financial Research Tasks

Antoine Bigeard; Langston Nashold; Rayan Krishnan; Shirley Wu

arXiv:2508.00828·cs.CE·August 5, 2025

Finance Agent Benchmark: Benchmarking LLMs on Real-world Financial Research Tasks

Antoine Bigeard, Langston Nashold, Rayan Krishnan, Shirley Wu

PDF

Open Access

TL;DR

The paper introduces the Finance Agent Benchmark, a comprehensive dataset and evaluation framework for testing large language models on real-world financial research tasks involving SEC filings and financial analysis.

Contribution

It presents a new benchmark with expert-validated questions and an agentic setup, highlighting current AI limitations in financial analysis tasks.

Findings

01

Best model achieved 46.8% accuracy

02

Average cost per query was $3.79

03

Significant room for improvement in AI financial capabilities

Abstract

Artificial Intelligence (AI) technology has emerged as a transformative force in financial analysis and the finance industry, though significant questions remain about the full capabilities of Large Language Model (LLM) agents in this domain. We present the Finance Agent Benchmark, featuring challenging and diverse real-world finance research problems that require LLMs to perform complex analysis using recent SEC filings. We construct the benchmark using a taxonomy of nine financial task categories, developed in consultation with experts from banks, hedge funds, and private equity firms. The dataset includes 537 expert-authored questions covering tasks from information retrieval to complex financial modeling, each validated through a rigorous review process to ensure accuracy and relevance. Moreover, we implement an agentic harness that equips LLMs with tools sufficient to produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · FinTech, Crowdfunding, Digital Finance · Explainable Artificial Intelligence (XAI)