Korean-Specific Dataset for Table Question Answering

Changwook Jun; Jooyoung Choi; Myoseop Sim; Hyun Kim; Hansol Jang,; Kyungkoo Min

arXiv:2201.06223·cs.CL·May 3, 2022

Korean-Specific Dataset for Table Question Answering

Changwook Jun, Jooyoung Choi, Myoseop Sim, Hyun Kim, Hansol Jang,, Kyungkoo Min

PDF

Open Access 1 Repo

TL;DR

This paper introduces Korean-specific datasets for table question answering, including a large collection of tables and a question-answer corpus, and demonstrates a Transformer-based model fine-tuned on these datasets.

Contribution

The paper presents the creation of Korean table question answering datasets and a tailored Transformer-based model, addressing the lack of Korean-specific resources in this domain.

Findings

01

The datasets are publicly available for research use.

02

The Transformer-based model achieves promising results on Korean table QA.

03

The datasets facilitate further research in Korean table question answering.

Abstract

Existing question answering systems mainly focus on dealing with text data. However, much of the data produced daily is stored in the form of tables that can be found in documents and relational databases, or on the web. To solve the task of question answering over tables, there exist many datasets for table question answering written in English, but few Korean datasets. In this paper, we demonstrate how we construct Korean-specific datasets for table question answering: Korean tabular dataset is a collection of 1.4M tables with corresponding descriptions for unsupervised pre-training language models. Korean table question answering corpus consists of 70k pairs of questions and answers created by crowd-sourced workers. Subsequently, we then build a pre-trained language model based on Transformer and fine-tune the model for table question answering with these datasets. We then report the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lg-nlp/korwikitablequestions
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Absolute Position Encodings · Byte Pair Encoding