ConvSDG: Session Data Generation for Conversational Search

Fengran Mo; Bole Yi; Kelong Mao; Chen Qu; Kaiyu Huang; Jian-Yun Nie

arXiv:2403.11335·cs.IR·March 19, 2024·1 cites

ConvSDG: Session Data Generation for Conversational Search

Fengran Mo, Bole Yi, Kelong Mao, Chen Qu, Kaiyu Huang, Jian-Yun Nie

PDF

Open Access 1 Repo

TL;DR

ConvSDG leverages large language models to generate synthetic conversational session data, enhancing training for conversational dense retrieval and improving search effectiveness across multiple datasets.

Contribution

Proposes a novel framework using LLMs for session data generation to improve conversational search models, addressing data scarcity issues.

Findings

01

Generated data improves retrieval performance

02

Effective across multiple datasets

03

Outperforms several strong baselines

Abstract

Conversational search provides a more convenient interface for users to search by allowing multi-turn interaction with the search engine. However, the effectiveness of the conversational dense retrieval methods is limited by the scarcity of training data required for their fine-tuning. Thus, generating more training conversational sessions with relevant labels could potentially improve search performance. Based on the promising capabilities of large language models (LLMs) on text generation, we propose ConvSDG, a simple yet effective framework to explore the feasibility of boosting conversational search by using LLM for session data generation. Within this framework, we design dialogue/session-level and query-level data generation with unsupervised and semi-supervised learning, according to the availability of relevance judgments. The generated data are used to fine-tune the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fengranmark/convsdg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems