HyperCLOVA X 32B Think
NAVER Cloud HyperCLOVA X Team

TL;DR
HyperCLOVA X 32B Think is a Korean-focused vision-language model emphasizing reasoning, multimodal understanding, and agentic behaviors, achieving strong benchmark performance and open-sourcing to foster research.
Contribution
Introduces HyperCLOVA X 32B Think, a novel Korean-centric vision-language model with enhanced reasoning and agentic capabilities, and provides open access for community development.
Findings
Strong performance on Korean text-to-text benchmarks
Effective multimodal understanding and reasoning
Supports agent-oriented tasks
Abstract
In this report, we present HyperCLOVA X 32B Think, a vision-language model designed with particular emphasis on reasoning within the Korean linguistic and cultural context, as well as agentic ability. HyperCLOVA X 32B Think is pre-trained with a strong focus on reasoning capabilities and subsequently post-trained to support multimodal understanding, enhanced reasoning, agentic behaviors, and alignment with human preferences. Experimental evaluations against comparably sized models demonstrate that our model achieves strong performance on Korean text-to-text and vision-to-text benchmarks, as well as on agent-oriented evaluation tasks. By open-sourcing HyperCLOVA X 32B Think, we aim to support broader adoption and facilitate further research and innovation across both academic and industrial communities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
