HyperCLOVA X 32B Think

NAVER Cloud HyperCLOVA X Team

arXiv:2601.03286·cs.CV·January 8, 2026

HyperCLOVA X 32B Think

NAVER Cloud HyperCLOVA X Team

PDF

Open Access

TL;DR

HyperCLOVA X 32B Think is a Korean-focused vision-language model emphasizing reasoning, multimodal understanding, and agentic behaviors, achieving strong benchmark performance and open-sourcing to foster research.

Contribution

Introduces HyperCLOVA X 32B Think, a novel Korean-centric vision-language model with enhanced reasoning and agentic capabilities, and provides open access for community development.

Findings

01

Strong performance on Korean text-to-text benchmarks

02

Effective multimodal understanding and reasoning

03

Supports agent-oriented tasks

Abstract

In this report, we present HyperCLOVA X 32B Think, a vision-language model designed with particular emphasis on reasoning within the Korean linguistic and cultural context, as well as agentic ability. HyperCLOVA X 32B Think is pre-trained with a strong focus on reasoning capabilities and subsequently post-trained to support multimodal understanding, enhanced reasoning, agentic behaviors, and alignment with human preferences. Experimental evaluations against comparably sized models demonstrate that our model achieves strong performance on Korean text-to-text and vision-to-text benchmarks, as well as on agent-oriented evaluation tasks. By open-sourcing HyperCLOVA X 32B Think, we aim to support broader adoption and facilitate further research and innovation across both academic and industrial communities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques