A Multilingual Translator to SQL with Database Schema Pruning to Improve   Self-Attention

Marcelo Archanjo Jose; Fabio Gagliardi Cozman

arXiv:2306.14256·cs.AI·June 27, 2023

A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Marcelo Archanjo Jose, Fabio Gagliardi Cozman

PDF

1 Repo 1 Models 5 Datasets

TL;DR

This paper introduces a multilingual transformer-based model for translating natural language to SQL, employing database schema pruning to handle long sequences and improve accuracy across four languages.

Contribution

It presents a schema pruning technique and a multilingual fine-tuning approach for SQL translation, enhancing performance on long text sequences.

Findings

01

Accuracy improved from 0.718 to 0.736 on validation data.

02

Schema pruning effectively manages long input sequences.

03

Multilingual fine-tuning benefits non-English SQL translation.

Abstract

Long sequences of text are challenging in the context of transformers, due to quadratic memory increase in the self-attention mechanism. As this issue directly affects the translation from natural language to SQL queries (as techniques usually take as input a concatenated text with the question and the database schema), we present techniques that allow long text sequences to be handled by transformers with up to 512 input tokens. We propose a training process with database schema pruning (removal of tables and columns names that are useless for the query of interest). In addition, we used a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages simultaneously: English, Portuguese, Spanish, and French. Our proposed technique used the Spider dataset and increased the exact set match accuracy results from 0.718 to 0.736 in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

c4ai/gap-text2sql
pytorchOfficial

Models

🤗
Marchanjo/mRAT-SQL
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning