From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding
Anmol Gulati, Sahil Sen, Waqar Sarguroh, Kevin Paul

TL;DR
This paper introduces FRTR, a retrieval-augmented multimodal framework for understanding complex spreadsheets, significantly improving reasoning accuracy over large, multimodal enterprise spreadsheets compared to prior methods.
Contribution
The paper presents FRTR, a novel retrieval-augmented multimodal reasoning framework and a large-scale benchmark for complex spreadsheet understanding, addressing scalability and multimodal integration challenges.
Findings
FRTR achieves 74% accuracy on FRTR-Bench with Claude Sonnet 4.5.
FRTR attains 87% accuracy on SpreadsheetLLM with GPT-5.
FRTR reduces token usage by about 50% compared to serialization methods.
Abstract
Large Language Models (LLMs) struggle to reason over large-scale enterprise spreadsheets containing thousands of numeric rows, multiple linked sheets, and embedded visual content such as charts and receipts. Prior state-of-the-art spreadsheet reasoning approaches typically rely on single-sheet compression or full-context encoding, which limits scalability and fails to reflect how real users interact with complex, multimodal workbooks. We introduce FRTR-Bench, the first large-scale benchmark for multimodal spreadsheet reasoning, comprising 30 enterprise-grade Excel workbooks spanning nearly four million cells and more than 50 embedded images. To address these challenges, we present From Rows to Reasoning (FRTR), an advanced, multimodal retrieval-augmented generation framework that decomposes Excel workbooks into granular row, column, and block embeddings, employs hybrid lexical-dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpreadsheets and End-User Computing · Data Visualization and Analytics · Information Retrieval and Search Behavior
