PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge Graph
Romina Etezadi, Mehrnoush Shamsfard

TL;DR
PeCoQ is a newly created Persian dataset with 10,000 complex questions and answers over the FarsBase knowledge graph, including paraphrases and various complexity types, to facilitate question answering research.
Contribution
This paper introduces PeCoQ, the first large-scale Persian question answering dataset with complex questions, paraphrases, and associated SPARQL queries over a knowledge graph.
Findings
Contains 10,000 questions and answers.
Includes paraphrases for each question.
Features diverse complexity types such as multi-relation and temporal constraints.
Abstract
Question answering systems may find the answers to users' questions from either unstructured texts or structured data such as knowledge graphs. Answering questions using supervised learning approaches including deep learning models need large training datasets. In recent years, some datasets have been presented for the task of Question answering over knowledge graphs, which is the focus of this paper. Although many datasets in English were proposed, there have been a few question-answering datasets in Persian. This paper introduces \textit{PeCoQ}, a dataset for Persian question answering. This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase. For each question, the SPARQL query and two paraphrases that were written by linguists are provided as well. There are different types of complexities in the dataset, such as multi-relation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
