FinSheet-Bench: From Simple Lookups to Complex Reasoning, Where LLMs Break on Financial Spreadsheets
Jan Ravnik, Matja\v{z} Li\v{c}en, Felix B\"uhrmann, Bithiah Yuan, Felix Stinson, Tanvi Singh

TL;DR
This paper introduces FinSheet-Bench, a synthetic benchmark for evaluating large language models on complex financial spreadsheet reasoning, revealing current models' limitations in accuracy and robustness for professional finance tasks.
Contribution
The paper presents a new benchmark dataset for financial spreadsheet reasoning and evaluates multiple LLMs, highlighting their performance gaps and the need for specialized architectural approaches.
Findings
No model achieves error rates low enough for professional use.
Performance drops significantly on larger, more complex spreadsheets.
Models show consistent difficulty patterns across different spreadsheet complexities.
Abstract
While Large Language Models (LLMs) can accelerate text-heavy tasks in alternative investment due diligence, a gap remains in their ability to accurately extract and reason over structured tabular data from complex financial spreadsheets. Progress is held back by the lack of real industry fund portfolio datasets for benchmarking, as private equity data rooms are confidential. To address this, we introduce FinSheet-Bench, a benchmark of synthetic financial portfolio data modeled on real private equity fund structures, designed to evaluate LLM performance on text-serialized spreadsheet question answering and numeric reasoning tasks. Our evaluation of ten model configurations from OpenAI, Google, and Anthropic on financial spreadsheets, including complex layouts, fund dividers, and multi-line column names, reveals that no standalone model achieves error rates low enough for unsupervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpreadsheets and End-User Computing · FinTech, Crowdfunding, Digital Finance · Data Quality and Management
