TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data
Xiang Huang, Jiayu Shen, Shanshan Huang, Sitao Cheng, Xiaxia Wang,, Yuzhong Qu

TL;DR
TARGA is a framework that generates targeted synthetic data for semantic parsing, improving reasoning and generalization in knowledge base question answering without manual annotation.
Contribution
It introduces a dynamic, automated synthetic data generation method that enhances semantic parsing models' performance and generalization without manual labeling.
Findings
Outperforms existing non-fine-tuned methods on KBQA datasets.
Achieves significant F1 score improvements (+7.7, +12.2).
Demonstrates superior sample efficiency and robustness.
Abstract
Semantic parsing, which converts natural language questions into logic forms, plays a crucial role in reasoning within structured environments. However, existing methods encounter two significant challenges: reliance on extensive manually annotated datasets and limited generalization capability to unseen examples. To tackle these issues, we propose Targeted Synthetic Data Generation (TARGA), a practical framework that dynamically generates high-relevance synthetic data without manual annotation. Starting from the pertinent entities and relations of a given question, we probe for the potential relevant queries through layer-wise expansion and cross-layer combination. Then we generate corresponding natural language questions for these constructed queries to jointly serve as the synthetic demonstrations for in-context learning. Experiments on multiple knowledge base question answering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Bayesian Modeling and Causal Inference
MethodsBalanced Selection
