Diffutron: A Masked Diffusion Language Model for Turkish Language
\c{S}uayp Talha Kocabay, Talha R\"uzgar Akku\c{s}

TL;DR
Diffutron is a resource-efficient masked diffusion language model tailored for Turkish, demonstrating competitive performance in non-autoregressive text generation through a multi-stage training process.
Contribution
Introduces Diffutron, a novel masked diffusion language model for Turkish, utilizing a multi-stage training pipeline with continual pre-training and instruction tuning.
Findings
Achieves competitive results with smaller model size
Effective multi-stage training improves Turkish language modeling
Validates masked diffusion approach for morphologically rich languages
Abstract
Masked Diffusion Language Models (MDLMs) have emerged as a compelling non-autoregressive alternative to standard large language models; however, their application to morphologically rich languages remains limited. In this paper, we introduce , a masked diffusion language model specifically designed for Turkish. Our approach leverages a resource-efficient training pipeline, starting with LoRA-based continual pre-training of a multilingual encoder on a large-scale corpus. To enable generative capabilities, we employ a progressive instruction-tuning strategy, sequentially adapting the model on general and task-specific instruction sets. Experimental results across comprehensive benchmarks demonstrate that, despite its compact size, our model achieves competitive performance compared to existing multi-billion-parameter baselines. These findings validate the effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Computational and Text Analysis Methods
