Mind the Gap: Evaluating the Representativeness of Quantitative Medical Language Reasoning LLM Benchmarks for African Disease Burdens
Fred Mutisya (1, 2), Shikoh Gitau (1), Christine Syovata (2), Diana Oigara (2), Ibrahim Matende (2), Muna Aden (2), Munira Ali (2), Ryan Nyotu (2), Diana Marion (2), Job Nyangena (2), Nasubo Ongoma (1), Keith Mbae (1), Elizabeth Wamicha (1), Eric Mibuari (1)

TL;DR
This paper evaluates the representativeness of medical language model benchmarks for African disease burdens, highlighting significant underrepresentation of diseases prevalent in Africa and proposing a guideline-based benchmark to improve evaluation relevance.
Contribution
It introduces Alama Health QA, a regionally curated benchmark aligned with African clinical guidelines, addressing the gap in existing benchmarks.
Findings
Alama Health QA captures over 40% of NTD mentions in evaluated corpora.
Global benchmarks lack representation of key African diseases like sickle cell.
Existing benchmarks often do not align with regional clinical guidelines.
Abstract
Introduction: Existing medical LLM benchmarks largely reflect examination syllabi and disease profiles from high income settings, raising questions about their validity for African deployment where malaria, HIV, TB, sickle cell disease and other neglected tropical diseases (NTDs) dominate burden and national guidelines drive care. Methodology: We systematically reviewed 31 quantitative LLM evaluation papers (Jan 2019 May 2025) identifying 19 English medical QA benchmarks. Alama Health QA was developed using a retrieval augmented generation framework anchored on the Kenyan Clinical Practice Guidelines. Six widely used sets (AfriMedQA, MMLUMedical, PubMedQA, MedMCQA, MedQAUSMLE, and guideline grounded Alama Health QA) underwent harmonized semantic profiling (NTD proportion, recency, readability, lexical diversity metrics) and blinded expert rating across five dimensions: clinical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLeprosy Research and Treatment · Health Policy Implementation Science · Genomics and Rare Diseases
