Enhancing Reasoning Skills in Small Persian Medical Language Models Can Outperform Large-Scale Data Training

Mehrdad Ghassabi; Sadra Hakim; Hamidreza Baradaran Kashani; Pedram Rostami

arXiv:2510.20059·cs.CL·January 7, 2026

Enhancing Reasoning Skills in Small Persian Medical Language Models Can Outperform Large-Scale Data Training

Mehrdad Ghassabi, Sadra Hakim, Hamidreza Baradaran Kashani, Pedram Rostami

PDF

Open Access 1 Models

TL;DR

This paper demonstrates that small Persian medical language models can surpass larger models in reasoning skills by using reinforcement learning and preference optimization, even with limited data.

Contribution

The study introduces a novel training approach combining RLAIF and DPO to improve reasoning in small domain-specific language models in Persian.

Findings

01

Small models outperform larger ones in medical reasoning tasks.

02

Reinforcement learning with AI feedback enhances reasoning capabilities.

03

Limited data can be effectively used to develop domain-specific models.

Abstract

Enhancing reasoning capabilities in small language models is critical for specialized applications such as medical question answering, particularly in underrepresented languages like Persian. In this study, we employ Reinforcement Learning with AI Feedback (RLAIF) and Direct preference optimization (DPO) to improve the reasoning skills of a general-purpose Persian language model. To achieve this, we translated a multiple-choice medical question-answering dataset into Persian and used RLAIF to generate rejected-preferred answer pairs, which are essential for DPO training. By prompting both teacher and student models to produce Chain-of-Thought (CoT) reasoning responses, we compiled a dataset containing correct and incorrect reasoning trajectories. This dataset, comprising 2 million tokens in preferred answers and 2.5 million tokens in rejected ones, was used to train a baseline model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
gaokerena/gaokerena-r1.0
model· 2 dl· ♡ 1
2 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques