# Mobility Functional Status Ascertainment in Electronic Health Records using Large Language Models

**Authors:** Xingyi Liu, Muskan Garg, Heling Jia, Jennifer St. Sauver, Sandeep R. Pagali, Sunghwan Sohn

PMC · DOI: 10.21203/rs.3.rs-7104310/v1 · 2025-07-29

## TL;DR

This paper explores using large language models to extract mobility information from unstructured electronic health records, achieving high accuracy and supporting clinical and research applications.

## Contribution

The novel use of LLMs for extracting and standardizing mobility functional status from clinical notes in EHRs is introduced.

## Key findings

- Mobility Extraction achieves a micro-average accuracy of 0.952 and an F1-score of 0.962 at the patient level.
- Impairment Classification achieves a micro-average accuracy of 0.912 and an F1-score of 0.948.
- A local deterministic setup ensures consistent outputs and cross-institution generalizability.

## Abstract

With global aging, assessing functional status is vital for precision medicine. Electronic Health Records (EHRs), particularly unstructured data, hold abundant information on patient mobility. This study explores using Large Language Models (LLMs) to extract and standardize mobility status from unstructured EHR data (i.e., clinical notes). We annotated 600 clinical notes from three health care institutions located in southeastern Minnesota and west-central Wisconsin, focusing on expressions of mobility and associated impairment. Leveraging the open-source Llama 3 model, we tested various prompting strategies—including zero-shot, few-shot, and task decomposition—and evaluated their performance. Error analysis showed that while the model sometimes inferred impairments without explicit evidence, most errors were clinically reasonable, often reflecting borderline or ambiguous cases. While considering reasonable inference as correct, at the patient-level, Mobility Extraction achieves a micro-average accuracy of 0.952 with an F1-score of 0.962, and Impairment Classification produces a micro-average accuracy of 0.912 and an F1-score of 0.948. A local, deterministic setup improved trustworthiness by ensuring consistent outputs, safeguarding privacy, and demonstrating cross-institution generalizability. These findings highlight the feasibility of LLM-based solutions for extracting mobility functional status from unstructured EHR data, supporting both clinical applications and research.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12324587/full.md

---
Source: https://tomesphere.com/paper/PMC12324587