TARGA: Targeted Synthetic Data Generation for Practical Reasoning over   Structured Data

Xiang Huang; Jiayu Shen; Shanshan Huang; Sitao Cheng; Xiaxia Wang,; Yuzhong Qu

arXiv:2412.19544·cs.CL·December 30, 2024

TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data

Xiang Huang, Jiayu Shen, Shanshan Huang, Sitao Cheng, Xiaxia Wang,, Yuzhong Qu

PDF

Open Access 1 Video

TL;DR

TARGA is a framework that generates targeted synthetic data for semantic parsing, improving reasoning and generalization in knowledge base question answering without manual annotation.

Contribution

It introduces a dynamic, automated synthetic data generation method that enhances semantic parsing models' performance and generalization without manual labeling.

Findings

01

Outperforms existing non-fine-tuned methods on KBQA datasets.

02

Achieves significant F1 score improvements (+7.7, +12.2).

03

Demonstrates superior sample efficiency and robustness.

Abstract

Semantic parsing, which converts natural language questions into logic forms, plays a crucial role in reasoning within structured environments. However, existing methods encounter two significant challenges: reliance on extensive manually annotated datasets and limited generalization capability to unseen examples. To tackle these issues, we propose Targeted Synthetic Data Generation (TARGA), a practical framework that dynamically generates high-relevance synthetic data without manual annotation. Starting from the pertinent entities and relations of a given question, we probe for the potential relevant queries through layer-wise expansion and cross-layer combination. Then we generate corresponding natural language questions for these constructed queries to jointly serve as the synthetic demonstrations for in-context learning. Experiments on multiple knowledge base question answering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data· underline

Taxonomy

TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Bayesian Modeling and Causal Inference

MethodsBalanced Selection