Ko-MuSR: A Multistep Soft Reasoning Benchmark for LLMs Capable of Understanding Korean
Chanwoo Park, Suyoung Park, JiA Kang, Jongyeon Park, Sangho Kim, Hyunji M. Park, Sumin Bae, Mingyu Kang, Jaejin Lee

TL;DR
Ko-MuSR is a comprehensive Korean reasoning benchmark that evaluates multistep reasoning in long narratives, revealing cross-lingual generalization in large language models and the effectiveness of advanced prompting strategies.
Contribution
It introduces Ko-MuSR, the first detailed benchmark for multistep reasoning in Korean, and demonstrates how prompting strategies improve model performance.
Findings
Multilingual models outperform Korean-specific models in Korean reasoning tasks.
Prompting strategies significantly boost reasoning accuracy, nearing human performance.
Cross-lingual reasoning capabilities are evident in large language models.
Abstract
We present Ko-MuSR, the first benchmark to comprehensively evaluate multistep, soft reasoning in long Korean narratives while minimizing data contamination. Built following MuSR, Ko-MuSR features fully Korean narratives, reasoning chains, and multiple-choice questions verified by human annotators for logical consistency and answerability. Evaluations of four large language models -- two multilingual and two Korean-specialized -- show that multilingual models outperform Korean-focused ones even in Korean reasoning tasks, indicating cross-lingual generalization of reasoning ability. Carefully designed prompting strategies, which combine few-shot examples, reasoning traces, and task-specific hints, further boost accuracy, approaching human-level performance. Ko-MuSR offers a solid foundation for advancing Korean NLP by enabling systematic evaluation of long-context reasoning and prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
