A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese

Anh Tuan Nguyen; Mai Hoang Dao; Dat Quoc Nguyen

arXiv:2010.01891·cs.CL·October 6, 2020

A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese

Anh Tuan Nguyen, Mai Hoang Dao, Dat Quoc Nguyen

PDF

1 Repo

TL;DR

This paper introduces the first large-scale Vietnamese Text-to-SQL dataset and evaluates baseline models, demonstrating that Vietnamese-specific preprocessing and language models significantly enhance semantic parsing performance.

Contribution

It provides the first public Vietnamese Text-to-SQL dataset and systematically evaluates baseline models with language-specific enhancements.

Findings

01

Vietnamese word segmentation improves parsing accuracy.

02

NPMI aids schema linking in Vietnamese.

03

PhoBERT outperforms XLM-R for Vietnamese semantic parsing.

Abstract

Semantic parsing is an important NLP task. However, Vietnamese is a low-resource language in this research area. In this paper, we present the first public large-scale Text-to-SQL semantic parsing dataset for Vietnamese. We extend and evaluate two strong semantic parsing baselines EditSQL (Zhang et al., 2019) and IRNet (Guo et al., 2019) on our dataset. We compare the two baselines with key configurations and find that: automatic Vietnamese word segmentation improves the parsing results of both baselines; the normalized pointwise mutual information (NPMI) score (Bouma, 2009) is useful for schema linking; latent syntactic features extracted from a neural dependency parser for Vietnamese also improve the results; and the monolingual language model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) helps produce higher performances than the recent best multilingual language model XLM-R…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VinAIResearch/ViText2SQL
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsXLM-R