PCoQA: Persian Conversational Question Answering Dataset
Hamed Hematian Hemati, Atousa Toghyani, Atena Souri, Sayed Hesam, Alavian, Hossein Sameti, Hamid Beigy

TL;DR
The paper introduces PCoQA, a Persian conversational question answering dataset with over 9,000 dialogs, designed to challenge models with open-ended answers and longer responses, and evaluates baseline and pre-trained models on it.
Contribution
This work presents the first Persian conversational QA dataset, PCoQA, along with benchmark results, enabling research in Persian language understanding and dialogue systems.
Findings
Baseline models perform modestly on PCoQA.
Pre-trained models improve performance significantly.
The dataset presents challenges like open-ended answers and longer responses.
Abstract
Humans seek information regarding a specific topic through performing a conversation containing a series of questions and answers. In the pursuit of conversational question answering research, we introduce the PCoQA, the first \textbf{P}ersian \textbf{Co}nversational \textbf{Q}uestion \textbf{A}nswering dataset, a resource comprising information-seeking dialogs encompassing a total of 9,026 contextually-driven questions. Each dialog involves a questioner, a responder, and a document from the Wikipedia; The questioner asks several inter-connected questions from the text and the responder provides a span of the document as the answer for each question. PCoQA is designed to present novel challenges compared to previous question answering datasets including having more open-ended non-factual answers, longer answers, and fewer lexical overlaps. This paper not only presents the comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
