IslamicPCQA: A Dataset for Persian Multi-hop Complex Question Answering in Islamic Text Resources
Arash Ghafouri, Hasan Naderi, Mohammad Aghajani asl, Mahdi, Firouzmandi

TL;DR
IslamicPCQA is a pioneering Persian dataset for multi-hop complex question answering in Islamic texts, enabling advanced reasoning over non-structured sources with supporting facts.
Contribution
It introduces the first Persian multi-hop QA dataset based on Islamic encyclopedias, tailored for complex reasoning and non-structured information sources.
Findings
Contains 12,282 question-answer pairs
Includes supporting facts and key sentences
Covers diverse Islamic topics
Abstract
Nowadays, one of the main challenges for Question Answering Systems is to answer complex questions using various sources of information. Multi-hop questions are a type of complex questions that require multi-step reasoning to answer. In this article, the IslamicPCQA dataset is introduced. This is the first Persian dataset for answering complex questions based on non-structured information sources and consists of 12,282 question-answer pairs extracted from 9 Islamic encyclopedias. This dataset has been created inspired by the HotpotQA English dataset approach, which was customized to suit the complexities of the Persian language. Answering questions in this dataset requires more than one paragraph and reasoning. The questions are not limited to any prior knowledge base or ontology, and to provide robust reasoning ability, the dataset also includes supporting facts and key sentences. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsBalanced Selection
