Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability
Wei-Rui Chen, Ife Adebara, Khai Duy Doan, Qisheng Liao, Muhammad, Abdul-Mageed

TL;DR
This study evaluates ChatGPT's ability to identify over 670 languages across diverse families, revealing its limitations especially with low-resource languages and highlighting the need for further development.
Contribution
It introduces Babel-670, a comprehensive multilingual benchmark, and systematically assesses ChatGPT's language identification capabilities across various conditions.
Findings
ChatGPT performs poorly on African and low-resource languages.
It lags behind smaller, finetuned language identification tools.
Current models need improvement for diverse language support.
Abstract
ChatGPT has recently emerged as a powerful NLP tool that can carry out a variety of tasks. However, the range of languages ChatGPT can handle remains largely a mystery. To uncover which languages ChatGPT `knows', we investigate its language identification (LID) abilities. For this purpose, we compile Babel-670, a benchmark comprising 670 languages representing 24 language families spoken in five continents. Languages in Babel-670 run the gamut from the very high-resource to the very low-resource. We then study ChatGPT's (both GPT-3.5 and GPT-4) ability to (i) identify language names and language codes (ii) under zero- and few-shot conditions (iii) with and without provision of a label set. When compared to smaller finetuned LID tools, we find that ChatGPT lags behind. For example, it has poor performance on African languages. We conclude that current large language models would benefit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Adam · Softmax · Dense Connections · Linear Layer
