Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs
Yuval Kansal, Shmuel Berman, Lydia Liu

TL;DR
This paper evaluates the factual accuracy of multilingual LLMs, specifically Llama3.1, across various languages and highlights issues with extraneous information, truthfulness, and language bias.
Contribution
It provides a comprehensive assessment of Llama3.1's performance in answering factual questions in multiple languages, revealing significant biases and inaccuracies.
Findings
LLMs often provide extraneous information.
Performance drops in less common languages.
Biases against rare languages are exacerbated.
Abstract
Factuality is a necessary precursor to useful educational tools. As adoption of Large Language Models (LLMs) in education continues of grow, ensuring correctness in all settings is paramount. Despite their strong English capabilities, LLM performance in other languages is largely untested. In this work, we evaluate the correctness of the Llama3.1 family of models in answering factual questions appropriate for middle and high school students. We demonstrate that LLMs not only provide extraneous and less truthful information, but also exacerbate existing biases against rare languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
