Sheet as Token: A Graph-Enhanced Representation for Multi-Sheet Spreadsheet Understanding
Yiming Lei, Yiqi Wang, Yujia Zhang, Bo Guan, Depei Zhu, Chunhui Wang, Zhuonan Hao, Tianyu Shi

TL;DR
This paper introduces Sheet as Token, a graph-enhanced method that treats entire sheets as semantic units for improved multi-sheet spreadsheet retrieval and understanding.
Contribution
It proposes a novel sheet-level tokenization and graph-based reasoning framework that enhances retrieval accuracy over traditional chunk-based methods.
Findings
Sheet-level tokenization learns stable sheet representations.
Graph-enhanced reasoning improves cross-sheet retrieval performance.
Method outperforms shallow graph baselines with limited additional computation.
Abstract
Workbook-scale spreadsheet understanding is increasingly important for language-model-based data analysis agents, but remains challenging because relevant information is often distributed across multiple sheets with heterogeneous schemas, layouts, and implicit relationships. Existing retrieval-augmented approaches typically decompose spreadsheets into rows, columns, or blocks to improve scalability; however, such chunk-centric representations can fragment worksheets into isolated text spans and weaken global sheet-level semantics. We propose Sheet as Token, a graph-enhanced framework that treats each worksheet as a unified semantic unit for multi-sheet spreadsheet retrieval. Our method extracts schema-aware records from sheet names, column headers, representative values, and layout features, and encodes each worksheet into a compact dense token. Given a natural-language query, a Graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
