Disaggregated Health Data in LLMs: Evaluating Data Equity in the Context of Asian American Representation
Uvini Balasuriya Mudiyanselage, Bharat Jayprakash, Kookjin Lee, K. Hazel Kwon

TL;DR
This paper evaluates whether large language models can accurately and equitably provide disaggregated health information for Asian American sub-ethnic groups, addressing representation and data equity issues.
Contribution
It introduces a framework for assessing data disaggregation and equity in LLM outputs specifically for diverse Asian American populations.
Findings
LLMs often fail to provide properly disaggregated health data.
Significant disparities exist in LLM responses across sub-ethnic groups.
The study highlights the need for improved data representation in LLM training.
Abstract
Large language models (LLMs), such as ChatGPT and Claude, have emerged as essential tools for information retrieval, often serving as alternatives to traditional search engines. However, ensuring that these models provide accurate and equitable information tailored to diverse demographic groups remains an important challenge. This study investigates the capability of LLMs to retrieve disaggregated health-related information for sub-ethnic groups within the Asian American population, such as Korean and Chinese communities. Data disaggregation has been a critical practice in health research to address inequities, making it an ideal domain for evaluating representation equity in LLM outputs. We apply a suite of statistical and machine learning tools to assess whether LLMs deliver appropriately disaggregated and equitable information. By focusing on Asian American sub-ethnic groups, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Health Literacy and Information Accessibility
