Synthetic Query Generation using Large Language Models for Virtual   Assistants

Sonal Sannigrahi; Thiago Fraga-Silva; Youssef Oualil and; Christophe Van Gysel

arXiv:2406.06729·cs.IR·June 12, 2024

Synthetic Query Generation using Large Language Models for Virtual Assistants

Sonal Sannigrahi, Thiago Fraga-Silva, Youssef Oualil and, Christophe Van Gysel

PDF

TL;DR

This paper explores using Large Language Models to generate synthetic user queries for Virtual Assistants, aiming to improve speech recognition and understanding by creating more realistic and specific query data.

Contribution

It demonstrates that LLMs can generate more verbose, entity-specific queries that complement traditional template-based methods for VA training data augmentation.

Findings

01

LLMs produce more detailed, entity-specific queries

02

Generated queries are similar to real VA user queries

03

LLM and template methods are complementary

Abstract

Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands. The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives. Hence, the generation of synthetic queries that are similar to existing VA usage can greatly improve upon the VA's abilities -- especially for use-cases that do not (yet) occur in paired audio/text data. In this paper, we provide a preliminary exploration of the use of Large Language Models (LLMs) to generate synthetic queries that are complementary to template-based methods. We investigate whether the methods (a) generate queries that are similar to randomly sampled, representative, and anonymized user queries from a popular VA, and (b) whether the generated queries are specific. We find that LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.