Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning
Banghao Chi, Yining Xie, Mingyuan Wu, Jingcheng Yang, Jize Jiang, Zhaoheng Li, Shengyi Qian, Minjia Zhang, Klara Nahrstedt, Rui Hou, Xiangjun Fan, and Hanchao Yu

TL;DR
Spreadsheet-RL introduces a reinforcement learning framework to train specialized AI agents for complex, real-world spreadsheet tasks within Excel, significantly improving performance over existing methods.
Contribution
The paper presents a novel RL fine-tuning framework, a scalable data collection pipeline, and a comprehensive environment for training and evaluating spreadsheet agents.
Findings
Spreadsheet-RL improves Pass@1 from 12.0% to 23.4% on SpreadsheetBench.
It raises Pass@1 from 8.4% to 17.2% on the Domain-Spreadsheet dataset.
The framework demonstrates strong potential for real-world spreadsheet automation.
Abstract
Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a promising research direction. Most existing spreadsheet agents rely on specialized prompting over general-purpose LLMs; while this design has potentials on simple spreadsheet operations, it struggles to manage the complex, multi-step workflows typical of real-world applications. We introduce Spreadsheet-RL, a reinforcement learning (RL) fine-tuning framework designed to train specialized spreadsheet agents within a realistic Microsoft Excel environment. Spreadsheet-RL features an automated pipeline for scalable collection of paired start-goal spreadsheets from online forums, as well as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
