Domain-Specific Machine Translation to Translate Medicine Brochures in English to Sorani Kurdish
Mariam Shamal, Hossein Hassani

TL;DR
This paper presents a specialized SMT model for translating English medicine brochures into Sorani Kurdish, improving access to health information for Kurdish communities, with evaluation showing promising translation quality and user confidence.
Contribution
The study develops and evaluates a domain-specific SMT model for English to Sorani Kurdish medical translation using a parallel corpus and post-processing techniques.
Findings
BLEU scores improved with post-processing
50% of professionals found translations consistent
83.3% rated translations accurate
Abstract
Access to Kurdish medicine brochures is limited, depriving Kurdish-speaking communities of critical health information. To address this problem, we developed a specialized Machine Translation (MT) model to translate English medicine brochures into Sorani Kurdish using a parallel corpus of 22,940 aligned sentence pairs from 319 brochures, sourced from two pharmaceutical companies in the Kurdistan Region of Iraq (KRI). We trained a Statistical Machine Translation (SMT) model using the Moses toolkit, conducting seven experiments that resulted in BLEU scores ranging from 22.65 to 48.93. We translated three new brochures to improve the evaluation process and encountered unknown words. We addressed unknown words through post-processing with a medical dictionary, resulting in BLEU scores of 56.87, 31.05, and 40.01. Human evaluation by native Kurdish-speaking pharmacists, physicians, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices
