Generalists vs. Specialists: Evaluating Large Language Models for Urdu
Samee Arif, Abdul Hameed Azeemi, Agha Ali Raza, Awais Athar

TL;DR
This study compares general-purpose and specialized language models on Urdu NLP tasks, revealing that specialized models outperform general ones and that GPT-4-Turbo's evaluations align better with human judgments.
Contribution
It provides a comprehensive evaluation of large language models on Urdu, a low-resource language, highlighting the superiority of specialized models and analyzing evaluation methods.
Findings
Specialized models outperform general-purpose models across tasks.
GPT-4-Turbo's evaluations align more closely with human judgments.
The study offers insights into LLM effectiveness for low-resource languages.
Abstract
In this paper, we compare general-purpose models, GPT-4-Turbo and Llama-3-8b, with special-purpose models--XLM-Roberta-large, mT5-large, and Llama-3-8b--that have been fine-tuned on specific tasks. We focus on seven classification and seven generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented in Natural Language Processing (NLP). Despite the frequent advancements in Large Language Models (LLMs), their performance in low-resource languages, including Urdu, still needs to be explored. We also conduct a human evaluation for the generation tasks and compare the results with the evaluations performed by GPT-4-Turbo, Llama-3-8b and Claude 3.5 Sonnet. We find that special-purpose models consistently outperform general-purpose models across various tasks. We also find that the evaluation done by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsFocus
