Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing
Mohamed Afane, Ying Wang, and Juntao Chen

TL;DR
This study evaluates whether large language models can effectively allocate public health resources for childhood lead testing, revealing significant limitations in their reasoning and data retrieval capabilities.
Contribution
The paper introduces a structured evaluation of LLMs' ability to allocate health resources based on vulnerability indicators, highlighting their current shortcomings.
Findings
LLMs often overlooked high-priority neighborhoods with high lead prevalence.
Accuracy of LLM resource allocation averaged 0.46, up to 0.66 with ChatGPT 5 Deep Research.
LLMs frequently cited outdated data and relied on non-empirical narratives.
Abstract
Public health agencies face critical challenges in identifying high-risk neighborhoods for childhood lead exposure with limited resources for outreach and intervention programs. To address this, we develop a Priority Score integrating untested children proportions, elevated blood lead prevalence, and public health coverage patterns to support optimized resource allocation decisions across 136 neighborhoods in Chicago, New York City, and Washington, D.C. We leverage these allocation tasks, which require integrating multiple vulnerability indicators and interpreting empirical evidence, to evaluate whether large language models (LLMs) with agentic reasoning and deep research capabilities can effectively allocate public health resources when presented with structured allocation scenarios. LLMs were tasked with distributing 1,000 test kits within each city based on neighborhood vulnerability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
