Probing How Scalable Table Data Enhances General Long-Context Reasoning
Huaibing Xie, Guoliang Zhao, Yang Liu, Shihan Dou, Siming Huang, Yanling Xiao, Shaolei Wang, Yiting Liu, Cheng Zhang, Shaofan Liu, Pluto Zhou

TL;DR
This paper investigates how structured table data can improve large language models' ability to perform long-context reasoning, providing theoretical analysis, a scalable data synthesis pipeline, and empirical validation across multiple benchmarks.
Contribution
It introduces a mathematical analysis of tabular dependencies, proposes a scalable data synthesis pipeline (TableLong), and demonstrates significant improvements in long-context reasoning performance.
Findings
Table data enhances LLMs' long-context reasoning by over 8%.
Periodic structures in tables are key to reasoning improvements.
The proposed pipeline effectively synthesizes data that boosts reasoning capabilities.
Abstract
As real-world tasks grow increasingly complex, long-context reasoning has become a core capability for Large Language Models (LLMs). However, few studies explore which data types are effective for long-context reasoning and why. We find that structured table data with periodic structures shows strong potential for long-context reasoning. Motivated by this observation, we mathematically analyze tabular dependency structures using mutual information, revealing periodic non-vanishing dependencies in table data. Furthermore, we systematically analyze the capabilities of structured table data, conduct relevant scaling experiments, and validate its underlying mechanisms for enhancing long-context reasoning, yielding several meaningful insights. Leveraging these insights, we propose a simple yet scalable pipeline(TableLong) for synthesizing high-quality, diverse, and verifiable structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques
