FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining
Zhoujun Cheng, Haoyu Dong, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng,, Dongmei Zhang

TL;DR
FORTAP introduces a novel pretraining approach for tables that leverages spreadsheet formulas to enhance numerical reasoning capabilities, achieving state-of-the-art results on key tasks.
Contribution
It is the first to utilize large-scale spreadsheet formulas for numerical reasoning-aware table pretraining, guiding models to learn calculations and references.
Findings
Achieves state-of-the-art on cell type classification
Improves formula prediction accuracy
Demonstrates potential of formula-based pretraining
Abstract
Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning. More importantly, large amounts of spreadsheets with expert-made formulae are available on the web and can be obtained easily. FORTAP is the first method for numerical-reasoning-aware table pretraining by leveraging large corpus of spreadsheet formulae. We design two formula pretraining tasks to explicitly guide FORTAP to learn numerical reference and calculation in semi-structured tables. FORTAP achieves state-of-the-art results on two representative downstream tasks, cell type classification and formula prediction, showing great potential of numerical-reasoning-aware pretraining.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies · Spreadsheets and End-User Computing · Data Visualization and Analytics
