SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks
Srivatsa Kundurthy, Clara Na, Michael Handley, Zach Kirshner, Chen Bo Calvin Zhang, Manasi Sharma, Emma Strubell, John Ling

TL;DR
SpreadsheetArena introduces a platform for evaluating LLMs' ability to generate structured spreadsheet artifacts that meet diverse user constraints, highlighting the variability and challenges in aligning generated spreadsheets with domain-specific standards.
Contribution
We present SpreadsheetArena, a novel evaluation platform for assessing LLM-generated spreadsheets, emphasizing the complexity and variability of preferences across different use cases.
Findings
Preferences for spreadsheet features vary across use cases.
Even top-ranked models often do not align with domain-specific best practices.
Evaluation reveals significant variability in stylistic, structural, and functional features.
Abstract
Large language models (LLMs) are increasingly tasked with producing and manipulating structured artifacts. We consider the task of end-to-end spreadsheet generation, where language models are prompted to produce spreadsheet artifacts to satisfy users' explicit and implicit constraints, specified in natural language. We introduce SpreadsheetArena, a platform for evaluating models' performance on the task via blind pairwise evaluations of LLM-generated spreadsheet workbooks. As with other complex, open-ended tasks, relevant evaluation criteria can vary substantially across use cases and prompts, often in ways that are difficult to formalize. Compared to general chat or text generation settings, spreadsheet generation presents unique challenges and opportunities: the task output structure is well-defined and multi-dimensional, and there are often complex considerations around interactivity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpreadsheets and End-User Computing · Scientific Computing and Data Management · Software Engineering Research
