HyperCLOVA X THINK Technical Report

NAVER Cloud HyperCLOVA X Team

arXiv:2506.22403·cs.CL·July 2, 2025

HyperCLOVA X THINK Technical Report

NAVER Cloud HyperCLOVA X Team

PDF

Open Access 2 Models

TL;DR

HyperCLOVA X THINK is a large, reasoning-focused multilingual language model trained on extensive Korean and English data, achieving high performance on Korean benchmarks and surpassing GPT-4.1 in vision tasks, with efficient training and future open-source plans.

Contribution

Introduces HyperCLOVA X THINK, a novel reasoning-oriented large language model with advanced training techniques, multi-modal capabilities, and competitive benchmarks, while emphasizing efficiency and open-source readiness.

Findings

01

Achieves high scores on Korean benchmarks like KMMLU and KoBigBench.

02

Matches or exceeds GPT-4.1 on vision-augmented KCSAT STEM tasks.

03

Uses less training compute than comparable models.

Abstract

We introduce HyperCLOVA X THINK, the first reasoning-focused large language model in the HyperCLOVA X family, pre-trained on roughly $6$ trillion high-quality Korean, and English tokens, augmented with targeted synthetic Korean data. It was implemented as a compute-memory-balanced Peri-LN Transformer scaled with $μ$ P, pre-trained through a three-stage curriculum that expands the context window to $128$ K tokens, and post-trained via supervised fine-tuning with Reinforcement Learning from Verifiable Rewards supports both detailed rationale and concise-answer modes. It delivers competitive performance against similarly sized models on Korea-focused benchmarks such as KMMLU, CSAT, KoBALT-700, HAERAE-1.0, and KoBigBench, while preserving robust bilingual consistency and translation quality. In addition, a vision-augmented variant matches or exceeds GPT-4.1 on the KCSAT STEM benchmark, all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques