SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Haoyu Dong; Jianbo Zhao; Yuzhang Tian; Junyu Xiong; Shiyu Xia; Mengyu; Zhou; Yun Lin; Jos\'e Cambronero; Yeye He; Shi Han; Dongmei Zhang

arXiv:2407.09025·cs.AI·April 3, 2025·2 cites

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Junyu Xiong, Shiyu Xia, Mengyu, Zhou, Yun Lin, Jos\'e Cambronero, Yeye He, Shi Han, Dongmei Zhang

PDF

Open Access 2 Repos 1 Video

TL;DR

SpreadsheetLLM introduces an innovative encoding framework for large language models to better understand and reason about spreadsheets, significantly improving performance in table detection, compression, and question answering tasks.

Contribution

The paper presents SheetCompressor, a novel encoding method that effectively compresses spreadsheets for LLMs, enabling enhanced understanding and reasoning capabilities.

Findings

01

Outperforms vanilla serialization by 25.6% in table detection

02

Achieves 78.9% F1 score in spreadsheet QA tasks

03

Provides an average compression ratio of 25 times

Abstract

Spreadsheets are characterized by their extensive two-dimensional grids, flexible layouts, and varied formatting options, which pose significant challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs' token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

SPREADSHEETLLM: Encoding Spreadsheets for Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling