Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in LMICs
David Restrepo, Chenwei Wu, Zhengxu Tang, Zitao Shuai, Thao Nguyen, Minh Phan, Jun-En Ding, Cong-Tinh Dao, Jack Gallifant, Robyn Gayle Dychiao,, Jose Carlo Artiaga, Andr\'e Hiroshi Bando, Carolina Pelegrini Barbosa, Gracitelli, Vincenz Ferrer, Leo Anthony Celi

TL;DR
This paper introduces a multilingual ophthalmological question-answering benchmark and a novel de-biasing method, CLARA, to improve LLM performance and fairness across languages in medical applications for LMICs.
Contribution
It presents the first multilingual ophthalmological QA benchmark and proposes CLARA, a new de-biasing approach that enhances LLM performance and reduces language bias in medical contexts.
Findings
Substantial language bias in LLM performance for ophthalmological QA.
Existing de-biasing methods are insufficient for medical multilingual tasks.
CLARA significantly improves performance and reduces bias across languages.
Abstract
Current ophthalmology clinical workflows are plagued by over-referrals, long waits, and complex and heterogeneous medical records. Large language models (LLMs) present a promising solution to automate various procedures such as triaging, preliminary tests like visual acuity assessment, and report summaries. However, LLMs have demonstrated significantly varied performance across different languages in natural language question-answering tasks, potentially exacerbating healthcare disparities in Low and Middle-Income Countries (LMICs). This study introduces the first multilingual ophthalmological question-answering benchmark with manually curated questions parallel across languages, allowing for direct cross-lingual comparisons. Our evaluation of 6 popular LLMs across 7 different languages reveals substantial bias across different languages, highlighting risks for clinical deployment of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRetinal Imaging and Analysis · Biomedical Text Mining and Ontologies · Acute Ischemic Stroke Management
