Optimizing Large Language Models for Turkish: New Methodologies in   Corpus Selection and Training

H. Toprak Kesgin; M. Kaan Yuce; Eren Dogan; M. Egemen Uzun; Atahan Uz,; Elif Ince; Yusuf Erdem; Osama Shbib; Ahmed Zeer; M. Fatih Amasyali

arXiv:2412.02775·cs.CL·December 5, 2024

Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training

H. Toprak Kesgin, M. Kaan Yuce, Eren Dogan, M. Egemen Uzun, Atahan Uz,, Elif Ince, Yusuf Erdem, Osama Shbib, Ahmed Zeer, M. Fatih Amasyali

PDF

TL;DR

This paper introduces new corpus selection and training methodologies to enhance Turkish language models, leveraging adapted datasets and merging techniques to significantly improve accuracy and comprehension in under-resourced language settings.

Contribution

It presents novel corpus adaptation and merging strategies specifically designed for Turkish, demonstrating substantial performance improvements over existing models.

Findings

01

Enhanced model accuracy in few-shot and zero-shot scenarios

02

Improved task-specific performance and language comprehension

03

Effective merging of adapted models boosts overall performance

Abstract

In this study, we develop and assess new corpus selection and training methodologies to improve the effectiveness of Turkish language models. Specifically, we adapted Large Language Model generated datasets and translated English datasets into Turkish, integrating these resources into the training process. This approach led to substantial enhancements in model accuracy for both few-shot and zero-shot learning scenarios. Furthermore, the merging of these adapted models was found to markedly improve their performance. Human evaluative metrics, including task-specific performance assessments, further demonstrated that these adapted models possess a greater aptitude for comprehending the Turkish language and addressing logic-based queries. This research underscores the importance of refining corpus selection strategies to optimize the performance of multilingual models, particularly for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.