MK-SQuIT: Synthesizing Questions using Iterative Template-filling
Benjamin A. Spiegel, Vincent Cheong, James E. Kaplan, Anthony Sanchez

TL;DR
This paper introduces MK-SQuIT, an automated framework for generating large-scale question/query datasets from WikiData using iterative template filling, reducing human effort and enabling domain adaptation for training machine translation models.
Contribution
The paper presents a novel, minimally supervised method for synthetic question/query dataset generation leveraging WikiData and multi-layered templating, with no human modification during generation.
Findings
Generated 110,000 question/query pairs across four domains.
Baseline model trained on the dataset shows promising results.
System is adaptable to multiple languages and domains.
Abstract
The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural access to database information. Existing methods of dataset generation require human input that scales linearly with the size of the dataset, resulting in small datasets. Aside from a short initial configuration task, no human input is required during the query generation process of our system. We leverage WikiData, a knowledge base of RDF triples, as a source for generating the main content of questions and queries. Using multiple layers of question templating we are able to sidestep some of the most challenging parts of query generation that have been handled by humans in previous methods;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
