Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic
Sabri Boughorbel, Majd Hawasly

TL;DR
This study evaluates the multilingual and multi-turn instruction capabilities of open LLMs in Arabic using a customized benchmark and GPT-4 as an evaluator, revealing insights into model performance variations and potential ensemble strategies.
Contribution
It introduces a comprehensive Arabic benchmark for multi-turn instruction evaluation and compares open LLMs' performance, highlighting the effectiveness of fine-tuning and ensemble approaches.
Findings
Fine-tuned models perform competitively with models trained from scratch.
Model responses vary across task categories like logic and literacy.
Ensemble of small LLMs could rival proprietary models.
Abstract
While significant progress has been made in benchmarking Large Language Models (LLMs) across various tasks, there is a lack of comprehensive evaluation of their abilities in responding to multi-turn instructions in less-commonly tested languages like Arabic. Our paper offers a detailed examination of the proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized Arabic translation of the MT-Bench benchmark suite, we employ GPT-4 as a uniform evaluator for both English and Arabic queries to assess and compare the performance of the LLMs on various open-ended tasks. Our findings reveal variations in model responses on different task categories, e.g., logic vs. literacy, when instructed in English or Arabic. We find that fine-tuned base models using multilingual and multi-turn datasets could be competitive to models trained from scratch on multilingual data. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Absolute Position Encodings · Adam · Label Smoothing · Residual Connection
