Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?
Ipek Baris Schlicht, Zhixue Zhao, Burcu Sayin, Lucie Flek and, Paolo Rosso

TL;DR
This paper investigates the consistency of Large Language Models' responses to health questions across multiple languages, revealing significant inconsistencies that could impact healthcare information quality.
Contribution
It introduces a multilingual health inquiry dataset and a prompt-based evaluation method for cross-lingual response comparison in LLMs.
Findings
Significant response inconsistencies across languages
Multilingual dataset with disease categorization
Highlighting challenges in cross-lingual healthcare applications
Abstract
Equitable access to reliable health information is vital for public health, but the quality of online health resources varies by language, raising concerns about inconsistencies in Large Language Models (LLMs) for healthcare. In this study, we examine the consistency of responses provided by LLMs to health-related questions across English, German, Turkish, and Chinese. We largely expand the HealthFC dataset by categorizing health-related questions by disease type and broadening its multilingual scope with Turkish and Chinese translations. We reveal significant inconsistencies in responses that could spread healthcare misinformation. Our main contributions are 1) a multilingual health-related inquiry dataset with meta-information on disease categories, and 2) a novel prompt-based evaluation workflow that enables sub-dimensional comparisons between two languages through parsing. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · linguistics and terminology studies
