A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention
Marcelo Archanjo Jose, Fabio Gagliardi Cozman

TL;DR
This paper introduces a multilingual transformer-based model for translating natural language to SQL, employing database schema pruning to handle long sequences and improve accuracy across four languages.
Contribution
It presents a schema pruning technique and a multilingual fine-tuning approach for SQL translation, enhancing performance on long text sequences.
Findings
Accuracy improved from 0.718 to 0.736 on validation data.
Schema pruning effectively manages long input sequences.
Multilingual fine-tuning benefits non-English SQL translation.
Abstract
Long sequences of text are challenging in the context of transformers, due to quadratic memory increase in the self-attention mechanism. As this issue directly affects the translation from natural language to SQL queries (as techniques usually take as input a concatenated text with the question and the database schema), we present techniques that allow long text sequences to be handled by transformers with up to 512 input tokens. We propose a training process with database schema pruning (removal of tables and columns names that are useless for the query of interest). In addition, we used a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages simultaneously: English, Portuguese, Spanish, and French. Our proposed technique used the Spider dataset and increased the exact set match accuracy results from 0.718 to 0.736 in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
