Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore
Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh

TL;DR
This paper introduces a pipeline for evaluating the factual accuracy of long-form multilingual LLM-generated texts across diverse languages and regions, using FActScore, and provides guidelines for multilingual factual evaluation.
Contribution
It presents a simple, effective pipeline for multilingual factuality assessment of long-form texts using FActScore and offers comprehensive guidelines for regional diversity evaluation.
Findings
FActScore can be effectively applied to multiple languages.
Multilingual LLMs show varying factual accuracy across regions.
Guidelines facilitate standardized multilingual factual evaluation.
Abstract
Evaluating the factuality of long-form large language model (LLM)-generated text is an important challenge. Recently there has been a surge of interest in factuality evaluation for English, but little is known about the factuality evaluation of multilingual LLMs, specially when it comes to long-form generation. %This paper systematically evaluates multilingual LLMs' factual accuracy across languages and geographic regions. We introduce a simple pipeline for multilingual factuality evaluation, by applying FActScore (Min et al., 2023) for diverse languages. In addition to evaluating multilingual factual generation, we evaluate the factual accuracy of long-form text generation in topics that reflect regional diversity. We also examine the feasibility of running the FActScore pipeline using non-English Wikipedia and provide comprehensive guidelines on multilingual factual evaluation for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
