ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters
Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, Gerhard Weikum

TL;DR
ComQA is a large, high-quality dataset of real user questions with paraphrase clusters, designed to advance research in complex factoid question answering by capturing diverse question phenomena.
Contribution
The paper introduces ComQA, a novel large-scale dataset of real user questions with paraphrase grouping and annotations, addressing gaps in existing QA datasets.
Findings
State-of-the-art systems perform poorly on ComQA
ComQA covers diverse question phenomena like compositionality and temporal reasoning
The dataset enables future research in complex question answering
Abstract
To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions come from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by existing search engine technology. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
