A Dataset of Open-Domain Question Answering with Multiple-Span Answers

Zhiyi Luo; Yingying Zhang; Shuyun Luo; Ying Zhao; Wentao Lyu

arXiv:2402.09923·cs.CL·February 16, 2024·1 cites

A Dataset of Open-Domain Question Answering with Multiple-Span Answers

Zhiyi Luo, Yingying Zhang, Shuyun Luo, Ying Zhao, Wentao Lyu

PDF

Open Access

TL;DR

CLEAN is a new Chinese multi-span question answering dataset that covers diverse open-domain topics and includes many questions requiring descriptive answers, addressing previous limitations in available benchmarks.

Contribution

The paper introduces CLEAN, a comprehensive Chinese MSQA dataset with diverse questions and descriptive answers, along with baseline models and analysis.

Findings

01

CLEAN presents diverse open-domain questions with many requiring descriptive answers.

02

Baseline models reveal the dataset's complexity and challenge.

03

CLEAN is publicly available for research use.

Abstract

Multi-span answer extraction, also known as the task of multi-span question answering (MSQA), is critical for real-world applications, as it requires extracting multiple pieces of information from a text to answer complex questions. Despite the active studies and rapid progress in English MSQA research, there is a notable lack of publicly available MSQA benchmark in Chinese. Previous efforts for constructing MSQA datasets predominantly emphasized entity-centric contextualization, resulting in a bias towards collecting factoid questions and potentially overlooking questions requiring more detailed descriptive responses. To overcome these limitations, we present CLEAN, a comprehensive Chinese multi-span question answering dataset that involves a wide range of open-domain subjects with a substantial number of instances requiring descriptive answers. Additionally, we provide established…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques