Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Huadai Liu, Rongjie Huang, Jinzheng He, Gang Sun, Ran Shen, Xize, Cheng, Zhou Zhao

TL;DR
Wav2SQL is a novel direct speech-to-SQL model that leverages large-scale pre-training and techniques like speech re-programming to improve robustness and accuracy in converting spoken questions to SQL queries, especially in out-of-domain scenarios.
Contribution
It introduces the first direct speech-to-SQL parsing model, Wav2SQL, and provides a large-scale multi-speaker dataset MASpider to advance research in this area.
Findings
Achieves up to 2.5% accuracy improvement over baselines.
Avoids error compounding in cascaded systems.
Enhances generalization to out-of-domain speech data.
Abstract
Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries given relational databases, which has been traditionally implemented in a cascaded manner while facing the following challenges: 1) model training is faced with the major issue of data scarcity, where limited parallel data is available; and 2) the systems should be robust enough to handle diverse out-of-domain speech samples that differ from the source data. In this work, we propose the first direct speech-to-SQL parsing model Wav2SQL which avoids error compounding across cascaded systems. Specifically, 1) to accelerate speech-driven SQL parsing research in the community, we release a large-scale and multi-speaker dataset MASpider; 2) leveraging the recent progress in the large-scale pre-training, we show that it alleviates the data scarcity issue and allow for direct speech-to-SQL parsing; and 3) we include the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
