Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs
Arian Askari, Roxana Petcu, Chuan Meng, Mohammad Aliannejadi, Amin, Abolghasemi, Evangelos Kanoulas, Suzan Verberne

TL;DR
This paper introduces SOLID, a novel LLM-based method for zero-shot generation of large-scale, intent-aware information-seeking dialogs, improving intent prediction models without manual annotation.
Contribution
The paper proposes SOLID with self-seeding and multi-intent self-instructing schemes, and SOLID-RL for enhanced dialog generation, reducing manual effort and surpassing existing dataset sizes.
Findings
Generated over 300k dialogs surpassing existing datasets
IP models trained on SOLID data outperform those trained on human data
Self-instructing schemes improve dialog quality and complexity handling
Abstract
Identifying user intents in information-seeking dialogs is crucial for a system to meet user's information needs. Intent prediction (IP) is challenging and demands sufficient dialogs with human-labeled intents for training. However, manually annotating intents is resource-intensive. While large language models (LLMs) have been shown to be effective in generating synthetic data, there is no study on using LLMs to generate intent-aware information-seeking dialogs. In this paper, we focus on leveraging LLMs for zero-shot generation of large-scale, open-domain, and intent-aware information-seeking dialogs. We propose SOLID, which has novel self-seeding and multi-intent self-instructing schemes. The former improves the generation quality by using the LLM's own knowledge scope to initiate dialog generation; the latter prompts the LLM to generate utterances sequentially, and mitigates the need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Speech and dialogue systems · Business Process Modeling and Analysis
MethodsFocus
