Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model
Khalil Hennara, Muhammad Hreden, Mohamed Motaism Hamed, Zeina Aldallal, Sara Chrouf, and Safwan AlModhayan

TL;DR
Mutarjim is a small, efficient Arabic-English translation model that outperforms larger models by using a specialized training approach and a new comprehensive benchmark, Tarjama-25.
Contribution
This paper introduces Mutarjim, a compact model for bidirectional translation, and Tarjama-25, a new benchmark dataset for Arabic-English translation evaluation.
Findings
Mutarjim outperforms larger models on multiple benchmarks.
It achieves state-of-the-art results on the Tarjama-25 benchmark.
The model reduces computational costs significantly.
Abstract
We introduce Mutarjim, a compact yet powerful language model for bidirectional Arabic-English translation. While large-scale LLMs have shown impressive progress in natural language processing tasks, including machine translation, smaller models. Leveraging this insight, we developed Mutarjim based on Kuwain-1.5B , a language model tailored for both Arabic and English. Despite its modest size, Mutarjim outperforms much larger models on several established benchmarks, achieved through an optimized two-phase training approach and a carefully curated, high-quality training corpus.. Experimental results show that Mutarjim rivals models up to 20 times larger while significantly reducing computational costs and training requirements. We also introduce Tarjama-25, a new benchmark designed to overcome limitations in existing Arabic-English benchmarking datasets, such as domain narrowness, short…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
