q2d: Turning Questions into Dialogs to Teach Models How to Search

Yonatan Bitton; Shlomi Cohen-Ganor; Ido Hakimi; Yoad Lewenberg; Roee; Aharoni; Enav Weinreb

arXiv:2304.14318·cs.CL·December 27, 2023·1 cites

q2d: Turning Questions into Dialogs to Teach Models How to Search

Yonatan Bitton, Shlomi Cohen-Ganor, Ido Hakimi, Yoad Lewenberg, Roee, Aharoni, Enav Weinreb

PDF

Open Access

TL;DR

This paper introduces q2d, an automatic pipeline that generates information-seeking dialogs from questions to train models in search tasks, reducing reliance on human data and enabling domain adaptation.

Contribution

q2d automates dialog data generation from questions using large language models, improving training efficiency and scalability for search-grounded dialog systems.

Findings

01

Models trained on synthetic data achieve 90-97% of human data performance.

02

Successfully generated domain-specific dialog data without existing datasets.

03

Humans find generated dialogs high quality and hard to distinguish from real dialogs.

Abstract

One of the exciting capabilities of recent language models for dialog is their ability to independently search for relevant information to ground a given dialog response. However, obtaining training data to teach models how to issue search queries is time and resource consuming. In this work, we propose q2d: an automatic data generation pipeline that generates information-seeking dialogs from questions. We prompt a large language model (PaLM) to create conversational versions of question answering datasets, and use it to improve query generation models that communicate with external search APIs to ground dialog responses. Unlike previous approaches which relied on human written dialogs with search queries, our method allows to automatically generate query-based grounded dialogs with better control and scale. Our experiments demonstrate that: (1) For query generation on the QReCC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications