Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team: Yunju Bak, Hojin Lee, Minho Ryu, Jiyeon Ham, Seungjae, Jung, Daniel Wontae Nam, Taegyeong Eo, Donghun Lee, Doohae Jung, Boseop Kim,, Nayeon Kim, Jaesun Park, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Kyoung-Woon, On, Seulye Baeg, Junrae Cho, Sunghee Jung

TL;DR
Kanana introduces a series of compute-efficient bilingual language models optimized for Korean and English, achieving high performance with lower computational costs through innovative training and fine-tuning techniques.
Contribution
The paper presents a new series of bilingual models with reduced computational costs and detailed methods for training, fine-tuning, and adapting models for specific tasks.
Findings
Exceeds performance in Korean
Competitive in English
Significantly lower computational cost
Abstract
We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, retrieval augmented generation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗kakaocorp/kanana-1.5-2.1b-basemodel· 199 dl· ♡ 12199 dl♡ 12
- 🤗kakaocorp/kanana-nano-2.1b-instructmodel· 2.6k dl· ♡ 752.6k dl♡ 75
- 🤗kakaocorp/kanana-nano-2.1b-basemodel· 996 dl· ♡ 39996 dl♡ 39
- 🤗kakaocorp/kanana-nano-2.1b-embeddingmodel· 1.7k dl· ♡ 271.7k dl♡ 27
- 🤗datalama/kanana-nano-2.1b-embeddingmodel· 3 dl3 dl
- 🤗DimensionSTP/kanana-nano-2.1b-instruct-Ko-Reasoningmodel· 3 dl3 dl
- 🤗kakaocorp/kanana-1.5-8b-basemodel· 1.1k dl· ♡ 121.1k dl♡ 12
- 🤗kakaocorp/kanana-1.5-8b-instruct-2505model· 6.4k dl· ♡ 576.4k dl♡ 57
- 🤗kakaocorp/kanana-1.5-2.1b-instruct-2505model· 6.5k dl· ♡ 376.5k dl♡ 37
- 🤗BCCard/kanana-1.5-8b-instruct-2505-FP8-Dynamicmodel· 2 dl· ♡ 12 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsPruning
