PerCoR: Evaluating Commonsense Reasoning in Persian via Multiple-Choice Sentence Completion

Morteza Alikhani; Mohammadtaha Bagherifard; Erfan Zinvandi; Mehran Sarmadi

arXiv:2510.22616·cs.CL·January 19, 2026

PerCoR: Evaluating Commonsense Reasoning in Persian via Multiple-Choice Sentence Completion

Morteza Alikhani, Mohammadtaha Bagherifard, Erfan Zinvandi, Mehran Sarmadi

PDF

TL;DR

PerCoR is a large-scale Persian benchmark for commonsense reasoning, featuring novel sentence segmentation and distractor generation methods, revealing significant challenges for current models and transferring to English benchmarks.

Contribution

Introduces PerCoR, the first large-scale Persian commonsense reasoning dataset, with innovative segmentation and distractor generation techniques, and demonstrates its transferability to English benchmarks.

Findings

01

Humans score 89% on PerCoR

02

OpenAI-o3 achieves 92.18% accuracy

03

DeepSeek-R1 reaches 82.51% accuracy

Abstract

We introduced PerCoR (Persian Commonsense Reasoning), the first large-scale Persian benchmark for commonsense reasoning. PerCoR contains 106K multiple-choice sentence-completion problems drawn from more than forty news, cultural, and other web sources. We introduce a novel conjunction-based segmentation strategy to generate coherent sentence-completion pairs, enabling broad topical and structural diversity. To create challenging distractors, we propose DRESS-AF (Distractor Ranking via Embedding Similarity Scoring and Adversarial Filtering), a generation-free adversarial filtering method that selects distractors from the pool of gold continuations while maximising model confusion. Human annotators score 89% on PerCoR, while OpenAI-o3 achieves the highest performance at 92.18%, followed closely by Claude-Sonnet-3.7 (91.17%). The strongest open-source model, DeepSeek-R1, reaches 82.51%,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.