Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs

Yuval Kansal; Shmuel Berman; Lydia Liu

arXiv:2506.03051·cs.CL·June 9, 2025

Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs

Yuval Kansal, Shmuel Berman, Lydia Liu

PDF

Open Access

TL;DR

This paper evaluates the factual accuracy of multilingual LLMs, specifically Llama3.1, across various languages and highlights issues with extraneous information, truthfulness, and language bias.

Contribution

It provides a comprehensive assessment of Llama3.1's performance in answering factual questions in multiple languages, revealing significant biases and inaccuracies.

Findings

01

LLMs often provide extraneous information.

02

Performance drops in less common languages.

03

Biases against rare languages are exacerbated.

Abstract

Factuality is a necessary precursor to useful educational tools. As adoption of Large Language Models (LLMs) in education continues of grow, ensuring correctness in all settings is paramount. Despite their strong English capabilities, LLM performance in other languages is largely untested. In this work, we evaluate the correctness of the Llama3.1 family of models in answering factual questions appropriate for middle and high school students. We demonstrate that LLMs not only provide extraneous and less truthful information, but also exacerbate existing biases against rare languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling