TULIP: Adapting Open-Source Large Language Models for Underrepresented Languages and Specialized Financial Tasks

\.Irem Demirta\c{s}; Burak Payzun; Se\c{c}il Arslan

arXiv:2508.16243·cs.CL·August 25, 2025

TULIP: Adapting Open-Source Large Language Models for Underrepresented Languages and Specialized Financial Tasks

\.Irem Demirta\c{s}, Burak Payzun, Se\c{c}il Arslan

PDF

Open Access

TL;DR

This paper introduces TULIP, a pipeline for adapting open-source large language models to financial Turkish, improving their domain-specific and language capabilities for privacy-sensitive applications.

Contribution

It presents a novel five-stage pipeline for domain and language adaptation of Llama 3.1 and Qwen 2.5 models specifically for financial Turkish tasks.

Findings

01

Enhanced model performance on financial Turkish tasks

02

Effective domain and language adaptation demonstrated

03

Pipeline enables smaller models to handle specialized tasks

Abstract

Thanks to the growing popularity of large language models over the years, there is great potential for their applications in finance. Despite the exceptional performance of larger proprietary models, which are presented as black-box solutions through APIs, smaller models that can be hosted on-premise present opportunities for adaptability and privacy. Especially in cases where the management of sensitive information and application of domain knowledge is important, like finance, enhancing the capabilities of smaller models becomes crucial, notably for underrepresented languages. In this work, we introduce TULIP models, which adapt Llama 3.1 8B and Qwen 2.5 7B for domain and language adaptation, focusing on financial Turkish use cases. The five-stage development pipeline involves data collection, continual pre-training (CPT), benchmark design, synthetic data generation and supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational Physics and Python Applications