mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer
Marcelo Archanjo Jos\'e, Fabio Gagliardi Cozman

TL;DR
This paper adapts a transformer-based Text-to-SQL system for Portuguese by using multilingual models and translated datasets, demonstrating effective cross-lingual transfer and providing open-source tools for non-English SQL translation tasks.
Contribution
It introduces a multilingual adaptation of RAT-SQL+GAP for Portuguese, including dataset translation and training strategies, enabling effective non-English Text-to-SQL translation.
Findings
Multilingual training improves Portuguese SQL translation performance.
The adapted model achieves 83% of the baseline accuracy.
Training with combined original and translated datasets enhances results.
Abstract
The translation of natural language questions to SQL queries has attracted growing attention, in particular in connection with transformers and similar language models. A large number of techniques are geared towards the English language; in this work, we thus investigated translation to SQL when input questions are given in the Portuguese language. To do so, we properly adapted state-of-the-art tools and resources. We changed the RAT-SQL+GAP system by relying on a multilingual BART model (we report tests with other language models), and we produced a translated version of the Spider dataset. Our experiments expose interesting phenomena that arise when non-English languages are targeted; in particular, it is better to train with original and translated training datasets together, even if a single target language is desired. This multilingual BART model fine-tuned with a double-size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest · BERT · mBART · BART
