PerCoR: Evaluating Commonsense Reasoning in Persian via Multiple-Choice Sentence Completion
Morteza Alikhani, Mohammadtaha Bagherifard, Erfan Zinvandi, Mehran Sarmadi

TL;DR
PerCoR is a large-scale Persian benchmark for commonsense reasoning, featuring novel sentence segmentation and distractor generation methods, revealing significant challenges for current models and transferring to English benchmarks.
Contribution
Introduces PerCoR, the first large-scale Persian commonsense reasoning dataset, with innovative segmentation and distractor generation techniques, and demonstrates its transferability to English benchmarks.
Findings
Humans score 89% on PerCoR
OpenAI-o3 achieves 92.18% accuracy
DeepSeek-R1 reaches 82.51% accuracy
Abstract
We introduced PerCoR (Persian Commonsense Reasoning), the first large-scale Persian benchmark for commonsense reasoning. PerCoR contains 106K multiple-choice sentence-completion problems drawn from more than forty news, cultural, and other web sources. We introduce a novel conjunction-based segmentation strategy to generate coherent sentence-completion pairs, enabling broad topical and structural diversity. To create challenging distractors, we propose DRESS-AF (Distractor Ranking via Embedding Similarity Scoring and Adversarial Filtering), a generation-free adversarial filtering method that selects distractors from the pool of gold continuations while maximising model confusion. Human annotators score 89% on PerCoR, while OpenAI-o3 achieves the highest performance at 92.18%, followed closely by Claude-Sonnet-3.7 (91.17%). The strongest open-source model, DeepSeek-R1, reaches 82.51%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
