OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering
Zhengbao Jiang, Yi Mao, Pengcheng He, Graham Neubig, Weizhu Chen

TL;DR
OmniTab introduces a pretraining method using both natural and synthetic data to improve few-shot and full data table-based question answering, achieving state-of-the-art results with minimal annotation.
Contribution
The paper presents a novel omnivorous pretraining approach that combines natural and synthetic data to enhance table-based QA models with minimal annotation effort.
Findings
Achieves 16.2% absolute gain in 128-shot setting
Establishes new state-of-the-art on WikiTableQuestions
Demonstrates effectiveness of combined natural and synthetic pretraining
Abstract
The information in tables can be an important complement to text, making table-based question answering (QA) systems of great value. The intrinsic complexity of handling tables often adds an extra burden to both model design and data annotation. In this paper, we aim to develop a simple table-based QA model with minimal annotation effort. Motivated by the fact that table-based QA requires both alignment between questions and tables and the ability to perform complicated reasoning over multiple table elements, we propose an omnivorous pretraining approach that consumes both natural and synthetic data to endow models with these respective abilities. Specifically, given freely available tables, we leverage retrieval to pair them with relevant natural sentences for mask-based pretraining, and synthesize NL questions by converting SQL sampled from tables for pretraining with a QA loss. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗neulab/omnitab-large-finetuned-wtqmodel· 12 dl· ♡ 712 dl♡ 7
- 🤗neulab/omnitab-largemodel· 39 dl· ♡ 239 dl♡ 2
- 🤗neulab/omnitab-large-16shot-finetuned-wtq-16shotmodel· 1 dl· ♡ 11 dl♡ 1
- 🤗neulab/omnitab-large-16shotmodel· 4 dl· ♡ 24 dl♡ 2
- 🤗neulab/omnitab-large-128shotmodel· 3 dl3 dl
- 🤗neulab/omnitab-large-1024shotmodel· 2 dl2 dl
- 🤗neulab/omnitab-large-1024shot-finetuned-wtq-1024shotmodel· 4 dl4 dl
- 🤗neulab/omnitab-large-128shot-finetuned-wtq-128shotmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
