TAPEX: Table Pre-training via Learning a Neural SQL Executor
Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, and Weizhu Chen, Jian-Guang Lou

TL;DR
TAPEX introduces a novel table pre-training method by learning a neural SQL executor on synthetic data, significantly improving performance on multiple table understanding benchmarks.
Contribution
The paper presents the first approach to table pre-training using synthetic executable SQL programs, achieving state-of-the-art results on various downstream tasks.
Findings
Outperforms previous methods on four benchmark datasets.
Achieves new state-of-the-art accuracy on all evaluated tasks.
Demonstrates effectiveness of synthetic SQL-based pre-training for table understanding.
Abstract
Recent progress in language model pre-training has achieved a great success via leveraging large-scale unstructured textual data. However, it is still a challenge to apply pre-training on structured tabular data due to the absence of large-scale high-quality tabular data. In this paper, we propose TAPEX to show that table pre-training can be achieved by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries and their execution outputs. TAPEX addresses the data scarcity challenge via guiding the language model to mimic a SQL executor on the diverse, large-scale and high-quality synthetic corpus. We evaluate TAPEX on four benchmark datasets. Experimental results demonstrate that TAPEX outperforms previous table pre-training approaches by a large margin and achieves new state-of-the-art results on all of them. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/tapex-base-finetuned-tabfactmodel· 130 dl130 dl
- 🤗microsoft/tapex-base-finetuned-wikisqlmodel· 898k dl· ♡ 24898k dl♡ 24
- 🤗microsoft/tapex-basemodel· 1.6k dl· ♡ 481.6k dl♡ 48
- 🤗microsoft/tapex-large-finetuned-tabfactmodel· 82 dl· ♡ 882 dl♡ 8
- 🤗nielsr/tapex-large-finetuned-sqamodel· 7 dl7 dl
- 🤗nielsr/tapex-large-finetuned-tabfactmodel· 3 dl3 dl
- 🤗nielsr/tapex-large-finetuned-wikisqlmodel· 3 dl3 dl
- 🤗nielsr/tapex-large-finetuned-wtqmodel· 7 dl· ♡ 27 dl♡ 2
- 🤗nielsr/tapex-largemodel· 2 dl· ♡ 12 dl♡ 1
- 🤗microsoft/tapex-large-finetuned-wikisqlmodel· 44 dl· ♡ 1744 dl♡ 17
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
MethodsTable Pre-training via Execution
