Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model
Yen-Ting Lin, Yun-Nung Chen

TL;DR
Taiwan LLM is a culturally aligned language model specifically designed for Traditional Chinese as used in Taiwan, achieving superior understanding and generation by incorporating linguistic and cultural nuances.
Contribution
It introduces the first culturally resonant Large Language Model for Traditional Chinese, tailored to Taiwanese linguistic and cultural specifics, with open-source resources.
Findings
Outperforms existing models on Traditional Chinese tasks
Achieves high accuracy in understanding and generating Taiwanese Traditional Chinese
Open-source release fosters further research and collaboration
Abstract
In the realm of language models, the nuanced linguistic and cultural intricacies of Traditional Chinese, as spoken in Taiwan, have been largely overlooked. This paper introduces Taiwan LLM, a pioneering Large Language Model that specifically caters to the Traditional Chinese language, with a focus on the variant used in Taiwan. Leveraging a comprehensive pretraining corpus and instruction-finetuning datasets, we have developed a model that not only understands the complexities of Traditional Chinese but also embodies the cultural context of Taiwan. Taiwan LLM represents the first of its kind, a model that is not only linguistically accurate but also culturally resonant with its user base. Our evaluations demonstrate that Taiwan LLM achieves superior performance in understanding and generating Traditional Chinese text, outperforming existing models that are predominantly trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yentinglin/Taiwan-LLaMa-v0.0model· 6 dl· ♡ 16 dl♡ 1
- 🤗yentinglin/Taiwan-LLaMa-v0.9model· 7 dl7 dl
- 🤗yentinglin/Taiwan-LLaMa-v1.0model· 9 dl· ♡ 789 dl♡ 78
- 🤗yentinglin/Taiwan-LLM-7B-v2.0-basemodel· 16 dl· ♡ 1516 dl♡ 15
- 🤗yentinglin/Taiwan-LLM-7B-v2.1-basemodel· 13 dl13 dl
- 🤗yentinglin/Taiwan-LLM-7B-v2.0-chatmodel· 40 dl· ♡ 1040 dl♡ 10
- 🤗yentinglin/Taiwan-LLM-7B-v2.0.1-chatmodel· 536 dl· ♡ 33536 dl♡ 33
- 🤗yentinglin/Taiwan-LLM-7B-v2.1-chatmodel· 46 dl· ♡ 3146 dl♡ 31
- 🤗yentinglin/Taiwan-LLM-13B-v2.0-basemodel· 16 dl· ♡ 216 dl♡ 2
- 🤗yentinglin/Taiwan-LLM-13B-v2.0-chatmodel· 31 dl· ♡ 5131 dl♡ 51
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods
MethodsFocus
