# Enabling Episode-Level Transparency in Value-Based Care Through Large Language Model-Driven Provider Directories

**Authors:** Amol Kodan

PMC · DOI: 10.7759/cureus.102219 · Cureus · 2026-01-24

## TL;DR

This paper explores using large language models to improve healthcare provider directories by enabling better transparency and decision-making in value-based care.

## Contribution

The study introduces an LLM-driven provider directory chatbot and a revised ranking correctness metric for episode-based care navigation.

## Key findings

- LLM-driven directories achieved high episode identification accuracy (up to 91%) in synthetic data scenarios.
- Provider ranking reliability and numeric precision varied significantly across tested LLMs.
- The proposed ranking correctness formulation emphasizes accurate episode identification as a prerequisite for transparency.

## Abstract

Conventional provider directories, as a cornerstone interface, remain a critical yet structurally fragile component of the United States healthcare system, limiting transparency and constraining the effectiveness of value-based care (VBC). Conventional directory interfaces lack episode-level cost and risk context, rely on rigid search paradigms, and can contain inaccurate or incomplete information. These deficiencies hinder informed provider selection and weaken the operational impact of episode-based payment models. This study evaluates a large language model (LLM)-driven provider directory chatbot designed to support episode-based care navigation using strictly structured, synthetic cost and performance datasets. Four widely used LLMs, i.e., GPT-3.5-turbo, GPT-4o-mini, GPT-4o, and GPT-5.1, were assessed under identical deterministic conditions. Using 87 natural language test scenarios, we examined structured output validity, episode and risk-band identification, provider ranking accuracy, numeric fidelity, hallucination risk, abstention behavior, and operational performance. We further introduce a revised formulation of ranking correctness that explicitly treats accurate episode identification as a prerequisite for meaningful transparency. All models demonstrated consistently high episode identification accuracy, approaching 91%, though substantial variability was observed in downstream provider ranking reliability and numeric precision. Collectively, these preliminary findings suggest that LLM-enabled provider directories can meaningfully enhance transparency and the user experience within VBC settings, while highlighting specific performance dimensions that require optimization before large-scale deployment.

## Full-text entities

- **Diseases:** hallucination (MESH:D006212), LLMs (MESH:D007806)
- **Chemicals:** GPT-4o (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12928537/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC12928537/full.md

---
Source: https://tomesphere.com/paper/PMC12928537