PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark
Adil Bahaj, Oumaima Fadi, Mohamed Chetouani, Mounir Ghogho

TL;DR
PediatricsMQA is a new multi-modal benchmark designed to evaluate and improve the performance of AI models on pediatric medical question-answering tasks, addressing age bias and representation issues.
Contribution
It introduces a comprehensive pediatric QA dataset covering text and vision modalities across developmental stages, highlighting performance gaps in current models.
Findings
Significant performance drops in younger age groups
Existing models show age bias in pediatric tasks
Dataset covers diverse pediatric topics and imaging modalities
Abstract
Large language models (LLMs) and vision-augmented LLMs (VLMs) have significantly advanced medical informatics, diagnostics, and decision support. However, these models exhibit systematic biases, particularly age bias, compromising their reliability and equity. This is evident in their poorer performance on pediatric-focused text and visual question-answering tasks. This bias reflects a broader imbalance in medical research, where pediatric studies receive less funding and representation despite the significant disease burden in children. To address these issues, a new comprehensive multi-modal pediatric question-answering benchmark, PediatricsMQA, has been introduced. It consists of 3,417 text-based multiple-choice questions (MCQs) covering 131 pediatric topics across seven developmental stages (prenatal to adolescent) and 2,067 vision-based MCQs using 634 pediatric images from 67…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
