SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations

Amila Indika; Igor Molybog

arXiv:2510.19864·cs.SE·October 24, 2025

SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations

Amila Indika, Igor Molybog

PDF

Open Access

TL;DR

This paper introduces SODBench, a benchmark for evaluating large language models in generating human-readable documentation of spreadsheet operations, aiming to improve automation and knowledge transfer.

Contribution

It presents a new benchmark dataset of spreadsheet code snippets with natural language summaries and evaluates multiple LLMs for the task of spreadsheet operation documentation.

Findings

01

LLMs can generate accurate spreadsheet documentation

02

GPT-4o outperforms other models in the benchmark

03

SOD is feasible for improving spreadsheet reproducibility and collaboration

Abstract

Numerous knowledge workers utilize spreadsheets in business, accounting, and finance. However, a lack of systematic documentation methods for spreadsheets hinders automation, collaboration, and knowledge transfer, which risks the loss of crucial institutional knowledge. This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations. Many previous studies have utilized Large Language Models (LLMs) for generating spreadsheet manipulation code; however, translating that code into natural language for SOD is a less-explored area. To address this, we present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary. We evaluate five LLMs, GPT-4o, GPT-4o-mini, LLaMA-3.3-70B, Mixtral-8x7B, and Gemma2-9B, using BLEU, GLEU, ROUGE-L, and METEOR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpreadsheets and End-User Computing · Personal Information Management and User Behavior · Software Engineering Research