Domain Adaptation of a State of the Art Text-to-SQL Model: Lessons Learned and Challenges Found
Irene Manotas, Octavian Popescu, Ngoc Phuoc An Vo, Vadim Sheinin

TL;DR
This paper evaluates the domain adaptation challenges of top Text-to-SQL models like Picard and T5, analyzing their performance on different databases and proposing a rule-based disambiguation method to improve real-world applicability.
Contribution
It provides an empirical analysis of how well state-of-the-art Text-to-SQL models adapt to new domains and introduces a rule-based approach for value disambiguation without online database access.
Findings
T5 and Picard perform well on certain query structures
Domain adaptation remains a significant challenge
A rule-based disambiguation method improves inference without online DB access
Abstract
There are many recent advanced developments for the Text-to-SQL task, where the Picard model is one of the the top performing models as measured by the Spider dataset competition. However, bringing Text-to-SQL systems to realistic use-cases through domain adaptation remains a tough challenge. We analyze how well the base T5 Language Model and Picard perform on query structures different from the Spider dataset, we fine-tuned the base model on the Spider data and on independent databases (DB). To avoid accessing the DB content online during inference, we also present an alternative way to disambiguate the values in an input question using a rule-based approach that relies on an intermediate representation of the semantic concepts of an input question. In our results we show in what cases T5 and Picard can deliver good performance, we share the lessons learned, and discuss current domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Residual Connection · Layer Normalization · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?
