Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE

Simon Thorne

arXiv:2506.17330·cs.SE·June 24, 2025

Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE

Simon Thorne

PDF

TL;DR

This paper introduces FLARE, a benchmark framework for evaluating large language models on spreadsheet tasks, revealing their strengths in simple tasks and weaknesses in complex reasoning, emphasizing the need for improved logical capabilities.

Contribution

The paper presents FLARE, a novel benchmark for assessing LLMs on spreadsheet functions and reasoning, and provides insights into their current limitations in complex tasks.

Findings

01

LLMs perform well on simple spreadsheet tasks

02

LLMs often produce incorrect outputs in complex, multi-step operations

03

Current LLMs need enhanced logical reasoning for spreadsheet applications

Abstract

Large Language Models (LLMs) have demonstrated some significant capabilities across various domains; however, their effectiveness in spreadsheet related tasks remains underexplored. This study introduces a foundation for a comprehensive benchmark framework to evaluate the performance of leading LLMs in executing spreadsheet functions, formula generation and data manipulation tasks. The benchmark encompasses tasks ranging from basic formula creation to complex, real world spreadsheet scenarios. Our findings reveal that while LLMs exhibit proficiency in straightforward tasks, they often falter in complex, multi step operations, frequently producing plausible yet incorrect outputs. These results underscore the limitations of current LLMs in handling spreadsheet tasks that require precise logical reasoning and highlight the need for integrating symbolic reasoning capabilities into LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.