EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning
Mingyang Wei, Dehai Min, Zewen Liu, Yuzhang Xie, Guanchen Wu, Ziyang Zhang, Carl Yang, Max S. Y. Lau, Qi He, Lu Cheng, Wei Jin

TL;DR
EpiQAL is a new benchmark designed to evaluate large language models' ability to perform epidemiological reasoning, highlighting current limitations and the need for improved inference capabilities in health-related AI systems.
Contribution
This paper introduces EpiQAL, the first comprehensive benchmark for epidemiological question answering, with diverse subsets testing factual recall, inference, and conclusion reconstruction.
Findings
Current LLMs perform poorly on epidemiological reasoning tasks.
Multi-step inference is the most challenging aspect for models.
Scaling models alone does not improve epidemiological reasoning performance.
Abstract
Reliable epidemiological reasoning requires synthesizing study evidence to infer disease burden, transmission dynamics, and intervention effects at the population level. Existing medical question answering benchmarks primarily emphasize clinical knowledge or patient-level reasoning, yet few systematically evaluate evidence-grounded epidemiological inference. We present EpiQAL, the first diagnostic benchmark for epidemiological question answering across diverse diseases, comprising three subsets built from open-access literature. The three subsets progressively test factual recall, multi-step inference, and conclusion reconstruction under incomplete information, and are constructed through a quality-controlled pipeline combining taxonomy guidance, multi-model verification, and difficulty screening. Experiments on fourteen models spanning open-source and proprietary systems reveal that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Multimodal Machine Learning Applications
