SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations
Amila Indika, Igor Molybog

TL;DR
This paper introduces SODBench, a benchmark for evaluating large language models in generating human-readable documentation of spreadsheet operations, aiming to improve automation and knowledge transfer.
Contribution
It presents a new benchmark dataset of spreadsheet code snippets with natural language summaries and evaluates multiple LLMs for the task of spreadsheet operation documentation.
Findings
LLMs can generate accurate spreadsheet documentation
GPT-4o outperforms other models in the benchmark
SOD is feasible for improving spreadsheet reproducibility and collaboration
Abstract
Numerous knowledge workers utilize spreadsheets in business, accounting, and finance. However, a lack of systematic documentation methods for spreadsheets hinders automation, collaboration, and knowledge transfer, which risks the loss of crucial institutional knowledge. This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations. Many previous studies have utilized Large Language Models (LLMs) for generating spreadsheet manipulation code; however, translating that code into natural language for SOD is a less-explored area. To address this, we present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary. We evaluate five LLMs, GPT-4o, GPT-4o-mini, LLaMA-3.3-70B, Mixtral-8x7B, and Gemma2-9B, using BLEU, GLEU, ROUGE-L, and METEOR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpreadsheets and End-User Computing · Personal Information Management and User Behavior · Software Engineering Research
