CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare

Akash Ghosh; Srivarshinee Sridhar; Raghav Kaushik Ravi; Muhsin Muhsin; Sriparna Saha; Chirag Agarwal

arXiv:2512.11437·cs.CL·December 15, 2025

CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare

Akash Ghosh, Srivarshinee Sridhar, Raghav Kaushik Ravi, Muhsin Muhsin, Sriparna Saha, Chirag Agarwal

PDF

Open Access

TL;DR

CLINIC introduces a comprehensive multilingual benchmark to evaluate healthcare language models on trustworthiness aspects like truthfulness, fairness, safety, robustness, and privacy across diverse languages and healthcare topics.

Contribution

This work presents CLINIC, the first extensive multilingual benchmark for assessing trustworthiness of healthcare language models across five key dimensions and 18 tasks in 15 languages.

Findings

01

LMS struggle with factual accuracy

02

Bias exists across demographic and linguistic groups

03

Models are vulnerable to privacy breaches and adversarial attacks

Abstract

Integrating language models (LMs) in healthcare systems holds great promise for improving medical workflows and decision-making. However, a critical barrier to their real-world adoption is the lack of reliable evaluation of their trustworthiness, especially in multilingual healthcare settings. Existing LMs are predominantly trained in high-resource languages, making them ill-equipped to handle the complexity and diversity of healthcare queries in mid- and low-resource languages, posing significant challenges for deploying them in global healthcare contexts where linguistic diversity is key. In this work, we present CLINIC, a Comprehensive Multilingual Benchmark to evaluate the trustworthiness of language models in healthcare. CLINIC systematically benchmarks LMs across five key dimensions of trustworthiness: truthfulness, fairness, safety, robustness, and privacy, operationalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI · Global Health and Surgery