Natural Response Generation for Chinese Reading Comprehension

Nuo Chen; Hongguang Li; Yinan Bao; Baoyuan Wang; Jia Li

arXiv:2302.08817·cs.CL·October 10, 2023

Natural Response Generation for Chinese Reading Comprehension

Nuo Chen, Hongguang Li, Yinan Bao, Baoyuan Wang, Jia Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces Penguin, a large-scale Chinese dataset for natural response generation in machine reading comprehension, and proposes effective baseline models including Prompt-BART for improved human-like answer generation.

Contribution

The paper creates the first large-scale benchmark dataset for natural response generation in Chinese MRC and develops novel fine-tuning methods like Prompt-BART.

Findings

01

Prompt-BART significantly improves response quality.

02

Two-stage frameworks outperform end-to-end models.

03

Penguin dataset enables more human-like response research.

Abstract

Machine reading comprehension (MRC) is an important area of conversation agents and draws a lot of attention. However, there is a notable limitation to current MRC benchmarks: The labeled answers are mostly either spans extracted from the target corpus or the choices of the given candidates, ignoring the natural aspect of high-quality responses. As a result, MRC models trained on these datasets can not generate human-like responses in real QA scenarios. To this end, we construct a new dataset called Penguin to promote the research of MRC, providing a training and test bed for natural response generation to real scenarios. Concretely, Penguin consists of 200k training data with high-quality fluent, and well-informed responses. Penguin is the first benchmark towards natural response generation in Chinese MRC on a relatively large scale. To address the challenges in Penguin, we develop two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nuochenpku/penguin
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsTest