KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
Kaijing Ma, Xinrun Du, Yunran Wang, Haoran Zhang, Zhoufutu Wen,, Xingwei Qu, Jian Yang, Jiaheng Liu, Minghao Liu, Xiang Yue, Wenhao Huang, Ge, Zhang

TL;DR
KOR-Bench introduces a new benchmark for evaluating language models' reasoning abilities across diverse, knowledge-orthogonal tasks, emphasizing rule application and out-of-distribution performance.
Contribution
The paper proposes the KOR-Bench benchmark, focusing on knowledge-orthogonal reasoning tasks, and demonstrates its effectiveness through new model evaluations and detailed analyses.
Findings
O1-Preview and O1-Mini outperform GPT-4o and Claude-3.5-Sonnet in accuracy.
Stepwise Prompting with Self-Correction improves Cipher task performance.
KOR-Bench provides insights into reasoning bottlenecks and model capabilities.
Abstract
In this paper, we introduce Knowledge-Orthogonal Reasoning (KOR), a concept aimed at minimizing reliance on domain-specific knowledge, enabling more accurate evaluation of models' reasoning abilities in out-of-distribution settings. Based on this concept, we propose the Knowledge-Orthogonal Reasoning Benchmark (KOR-Bench), encompassing five task categories: Operation, Logic, Cipher, Puzzle, and Counterfactual. KOR-Bench emphasizes models' effectiveness in applying new rule descriptions to solve novel rule-driven questions. O1-Preview and O1-Mini achieve accuracies of 72.88% and 70.16%, surpassing Claude-3.5-Sonnet and GPT-4o (58.96% and 58.00%), highlighting the effectiveness of KOR-Bench. We perform detailed analyses, identifying bottlenecks in the Cipher task with Stepwise Prompting, where two rounds of Self-Correction yield optimal results. We evaluate performance across three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
