Multi-Step Reasoning in Korean and the Emergent Mirage

Guijin Son; Hyunwoo Ko; Dasol Choi

arXiv:2501.05712·cs.CL·March 13, 2025

Multi-Step Reasoning in Korean and the Emergent Mirage

Guijin Son, Hyunwoo Ko, Dasol Choi

PDF

Open Access 1 Datasets

TL;DR

This paper introduces HRMCR, a benchmark for evaluating large language models' multi-step reasoning in Korean cultural contexts, revealing emergent behaviors and the challenges models face in culturally specific reasoning tasks.

Contribution

The paper presents HRMCR, a novel benchmark for multi-step reasoning in Korean, highlighting the limitations of current models and analyzing the nature of emergent abilities.

Findings

01

Models under 2×10^25 FLOPs perform near zero on tasks.

02

Performance improves sharply beyond the FLOP threshold.

03

Emergent behavior may result from error accumulation, not new capabilities.

Abstract

We introduce HRMCR (HAE-RAE Multi-Step Commonsense Reasoning), a benchmark designed to evaluate large language models' ability to perform multi-step reasoning in culturally specific contexts, focusing on Korean. The questions are automatically generated via templates and algorithms, requiring LLMs to integrate Korean cultural knowledge into sequential reasoning steps. Consistent with prior observations on emergent abilities, our experiments reveal that models trained on fewer than \(2 \cdot 10^{25}\) training FLOPs struggle to solve any questions, showing near-zero performance. Beyond this threshold, performance improves sharply. State-of-the-art models (e.g., O1) still score under 50\%, underscoring the difficulty of our tasks. Notably, stepwise analysis suggests the observed emergent behavior may stem from compounding errors across multiple steps rather than reflecting a genuinely new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

HAERAE-HUB/HRMCR
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Mapping