Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Huadai Liu; Rongjie Huang; Jinzheng He; Gang Sun; Ran Shen; Xize; Cheng; Zhou Zhao

arXiv:2305.12552·cs.CL·May 23, 2023·1 cites

Wav2SQL: Direct Generalizable Speech-To-SQL Parsing

Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize, Cheng, Zhou Zhao

PDF

Open Access

TL;DR

Wav2SQL is a novel direct speech-to-SQL model that leverages large-scale pre-training and techniques like speech re-programming to improve robustness and accuracy in converting spoken questions to SQL queries, especially in out-of-domain scenarios.

Contribution

It introduces the first direct speech-to-SQL parsing model, Wav2SQL, and provides a large-scale multi-speaker dataset MASpider to advance research in this area.

Findings

01

Achieves up to 2.5% accuracy improvement over baselines.

02

Avoids error compounding in cascaded systems.

03

Enhances generalization to out-of-domain speech data.

Abstract

Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries given relational databases, which has been traditionally implemented in a cascaded manner while facing the following challenges: 1) model training is faced with the major issue of data scarcity, where limited parallel data is available; and 2) the systems should be robust enough to handle diverse out-of-domain speech samples that differ from the source data. In this work, we propose the first direct speech-to-SQL parsing model Wav2SQL which avoids error compounding across cascaded systems. Specifically, 1) to accelerate speech-driven SQL parsing research in the community, we release a large-scale and multi-speaker dataset MASpider; 2) leveraging the recent progress in the large-scale pre-training, we show that it alleviates the data scarcity issue and allow for direct speech-to-SQL parsing; and 3) we include the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems