GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

Tao Yu; Chien-Sheng Wu; Xi Victoria Lin; Bailin Wang; Yi; Chern Tan; Xinyi Yang; Dragomir Radev; Richard Socher; Caiming; Xiong

arXiv:2009.13845·cs.CL·June 1, 2021·59 cites

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi, Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming, Xiong

PDF

Open Access 1 Repo 1 Video

TL;DR

GraPPa introduces a pre-training method for table semantic parsing that combines synthetic question-SQL pairs with real data, improving performance across multiple benchmarks by learning a compositional inductive bias.

Contribution

The paper proposes a novel pre-training approach using synthetic data and a text-schema linking objective to enhance table semantic parsing models.

Findings

01

Outperforms RoBERTa-large on all tested benchmarks.

02

Establishes new state-of-the-art results.

03

Effectively combines synthetic and real data during pre-training.

Abstract

We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel text-schema linking objective that predicts the syntactic role of a table field in the SQL for each question-SQL pair. To maintain the model's ability to represent real-world data, we also include masked language modeling (MLM) over several existing table-and-language datasets to regularize the pre-training process. On four popular fully supervised and weakly supervised table semantic parsing benchmarks, GraPPa significantly outperforms RoBERTa-large as the feature representation layers and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taoyds/grappa
pytorch

Videos

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies