Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective
Yue Zhou, Barbara Di Eugenio, Lu Cheng

TL;DR
This study evaluates large language models in healthcare, revealing significant performance and fairness challenges across demographic groups, and highlights the need for specialized research to address these limitations.
Contribution
It provides a comprehensive assessment of LLMs' performance and fairness issues in healthcare tasks, emphasizing the critical limitations and the necessity for targeted solutions.
Findings
LLMs face significant challenges in healthcare tasks.
Persistent demographic fairness issues are observed.
Explicit demographic info yields mixed results.
Abstract
This paper studies the performance of large language models (LLMs), particularly regarding demographic fairness, in solving real-world healthcare tasks. We evaluate state-of-the-art LLMs with three prevalent learning frameworks across six diverse healthcare tasks and find significant challenges in applying LLMs to real-world healthcare tasks and persistent fairness issues across demographic groups. We also find that explicitly providing demographic information yields mixed results, while LLM's ability to infer such details raises concerns about biased health predictions. Utilizing LLMs as autonomous agents with access to up-to-date guidelines does not guarantee performance improvement. We believe these findings reveal the critical limitations of LLMs in healthcare fairness and the urgent need for specialized research in this area.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealthcare Systems and Practices
