SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Junyu Xiong, Shiyu Xia, Mengyu, Zhou, Yun Lin, Jos\'e Cambronero, Yeye He, Shi Han, Dongmei Zhang

TL;DR
SpreadsheetLLM introduces an innovative encoding framework for large language models to better understand and reason about spreadsheets, significantly improving performance in table detection, compression, and question answering tasks.
Contribution
The paper presents SheetCompressor, a novel encoding method that effectively compresses spreadsheets for LLMs, enabling enhanced understanding and reasoning capabilities.
Findings
Outperforms vanilla serialization by 25.6% in table detection
Achieves 78.9% F1 score in spreadsheet QA tasks
Provides an average compression ratio of 25 times
Abstract
Spreadsheets are characterized by their extensive two-dimensional grids, flexible layouts, and varied formatting options, which pose significant challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs' token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
