Evaluating the Utility of Personal Health Records in Personalized Health AI

Rory Sayres; Kejia Chen; Ayush Jain; Matthew Thompson; Jonathan Richina; Xiang Yin; Jimmy Hu; Fan Zhang; Bob Lou; Mike Sanchez; Ines Mezerreg; Meredith Schreier; Hamsa Subramaniam; I-Ching Lee; Yugang Jia; Daniel Mcduff; Yossi Matias; Avinatan Hassidim; Dale Webster; Yun Liu; Jackie Barr; Quang Duong

arXiv:2605.18937·cs.AI·May 20, 2026

Evaluating the Utility of Personal Health Records in Personalized Health AI

Rory Sayres, Kejia Chen, Ayush Jain, Matthew Thompson, Jonathan Richina, Xiang Yin, Jimmy Hu, Fan Zhang, Bob Lou, Mike Sanchez, Ines Mezerreg, Meredith Schreier, Hamsa Subramaniam, I-Ching Lee, Yugang Jia, Daniel Mcduff, Yossi Matias, Avinatan Hassidim, Dale Webster, Yun Liu

PDF

TL;DR

This study evaluates how large language models can improve health-related answers by utilizing personal health records, showing significant enhancements in helpfulness, safety, and relevance of responses.

Contribution

It introduces a comprehensive evaluation framework for LLMs using PHR data, demonstrating improved answer quality and identifying gaps in understanding complex health records.

Findings

01

Significant improvement in answer helpfulness with PHR data (p < 0.001)

02

Potential gains in safety, accuracy, relevance, and personalization

03

Framework identifies gaps in LLM understanding of complex PHRs

Abstract

Patient-managed Personal Health Records (PHRs) promises to empower patients to better understand their health; but information in the record is complex, potentially hindering insights. In this study, we assess the potential of large language models (LLMs, Gemini 3.0 Flash) to provide helpful answers to user health queries, when provided clinical data from PHRs as context. A total of 2,257 user queries were drawn from 3 different distributions to represent patient questions: shorter web search queries, longer questions derived from templates of chatbot conversations, and questions patients asked to their healthcare team (patient calls). Queries were matched with de-identified PHRs (from a pool of 1,945). Gemini responses were generated (1) without PHR context; (2) with a basic summary of demographics, conditions, and medications; (3) with full, extensive clinical notes. For evaluation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.