FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents
Eric Y. Kim, Jie Huang

TL;DR
FinRetrieval introduces a comprehensive benchmark for evaluating AI agents' ability to accurately retrieve specific financial data from structured sources, highlighting the impact of tool access and reasoning modes.
Contribution
This work presents the first benchmark dataset and evaluation framework for financial data retrieval by AI agents, including diverse configurations and detailed tool execution traces.
Findings
Tool availability significantly impacts retrieval accuracy.
Reasoning mode benefits vary inversely with base model capability.
Geographic performance gaps are due to fiscal year naming conventions.
Abstract
AI agents increasingly assist with financial research, yet no benchmark evaluates their ability to retrieve specific numeric values from structured databases. We introduce FinRetrieval, a benchmark of 500 financial retrieval questions with ground truth answers, agent responses from 14 configurations across three frontier providers (Anthropic, OpenAI, Google), and complete tool call execution traces. Our evaluation reveals that tool availability dominates performance: Claude Opus achieves 90.8% accuracy with structured data APIs but only 19.8% with web search alone--a 71 percentage point gap that exceeds other providers by 3-4x. We find that reasoning mode benefits vary inversely with base capability (+9.0pp for OpenAI vs +2.8pp for Claude), explained by differences in base-mode tool utilization rather than reasoning ability. Geographic performance gaps (5.6pp US advantage) stem from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Financial Reporting and XBRL · FinTech, Crowdfunding, Digital Finance
