Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction   Following: A Case Study of Arabic

Sabri Boughorbel; Majd Hawasly

arXiv:2310.14819·cs.CL·October 24, 2023·2 cites

Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic

Sabri Boughorbel, Majd Hawasly

PDF

Open Access

TL;DR

This study evaluates the multilingual and multi-turn instruction capabilities of open LLMs in Arabic using a customized benchmark and GPT-4 as an evaluator, revealing insights into model performance variations and potential ensemble strategies.

Contribution

It introduces a comprehensive Arabic benchmark for multi-turn instruction evaluation and compares open LLMs' performance, highlighting the effectiveness of fine-tuning and ensemble approaches.

Findings

01

Fine-tuned models perform competitively with models trained from scratch.

02

Model responses vary across task categories like logic and literacy.

03

Ensemble of small LLMs could rival proprietary models.

Abstract

While significant progress has been made in benchmarking Large Language Models (LLMs) across various tasks, there is a lack of comprehensive evaluation of their abilities in responding to multi-turn instructions in less-commonly tested languages like Arabic. Our paper offers a detailed examination of the proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized Arabic translation of the MT-Bench benchmark suite, we employ GPT-4 as a uniform evaluator for both English and Arabic queries to assess and compare the performance of the LLMs on various open-ended tasks. Our findings reveal variations in model responses on different task categories, e.g., logic vs. literacy, when instructed in English or Arabic. We find that fine-tuned base models using multilingual and multi-turn datasets could be competitive to models trained from scratch on multilingual data. Finally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Absolute Position Encodings · Adam · Label Smoothing · Residual Connection