A Pilot Study for Chinese SQL Semantic Parsing

Qingkai Min; Yuefeng Shi; Yue Zhang

arXiv:1909.13293·cs.CL·October 17, 2019·1 cites

A Pilot Study for Chinese SQL Semantic Parsing

Qingkai Min, Yuefeng Shi, Yue Zhang

PDF

Open Access 2 Repos

TL;DR

This paper introduces a Chinese version of the Spider SQL semantic parsing dataset, exploring the challenges of Chinese language processing and cross-lingual embeddings for improving SQL query generation.

Contribution

It creates a Chinese SQL semantic parsing dataset and evaluates character- and word-based encoders, highlighting the impact of segmentation errors and cross-lingual embeddings.

Findings

01

Word-based parsers are affected by segmentation errors.

02

Cross-lingual embeddings improve Chinese SQL parsing.

03

Chinese SQL dataset enables low-resource language research.

Abstract

The task of semantic parsing is highly useful for dialogue and question answering systems. Many datasets have been proposed to map natural language text into SQL, among which the recent Spider dataset provides cross-domain samples with multiple tables and complex queries. We build a Spider dataset for Chinese, which is currently a low-resource language in this task area. Interesting research questions arise from the uniqueness of the language, which requires word segmentation, and also from the fact that SQL keywords and columns of DB tables are typically written in English. We compare character- and word-based encoders for a semantic parser, and different embedding schemes. Results show that word-based semantic parser is subject to segmentation errors and cross-lingual word embeddings are useful for text-to-SQL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies