# Converting unstructured cardiac catheterization and echocardiography reports into structured data using transformer-based language models

**Authors:** Fagen Xie, Ming-Sum Lee, Wansu Chen, Derek Q Phan

PMC · DOI: 10.1093/jamiaopen/ooag036 · JAMIA Open · 2026-03-26

## TL;DR

This study shows that transformer-based language models can accurately extract structured data from unstructured cardiac reports, improving data accessibility for research and patient care.

## Contribution

The study demonstrates the effectiveness of locally run transformer-based models for privacy-preserving clinical data extraction from cardiac reports.

## Key findings

- Both BioclinicalBERT and BART-Large-CNN achieved over 90% accuracy, precision, and recall for echocardiography and cardiac catheterization reports.
- BART-Large-CNN slightly outperformed BioclinicalBERT in cardiac catheterization data extraction metrics.
- Model performance improved with more training data but plateaued around 1000 reports.

## Abstract

Echocardiography and cardiac catheterization reports capture important clinical assessment information of cardiac function and disease severity. This study explores using open-source transformer-based language models (LMs) that are run locally within an institutional environment as a privacy-preserving alternative to external API-based large LM to systematically extract clinical data from unstructured echocardiography and cardiac catheterization reports, aiming to improve data accessibility for research and patient care.

Two transformer-based LMs, BioclinicalBERT and BART-Large-CNN, were fine-tuned in a secure local environment using a question-answering approach. The dataset included 3286 echocardiography and 1884 cardiac catheterization reports from Kaiser Permanente Southern California’s electronic health records, annotated for 25 and 47 predefined categories, respectively. Three hundred reports from each type were randomly selected and used for validation, with the remainder for training. Model performance was assessed using accuracy, precision, recall, and F1-score at 2 probability thresholds. The effect of training set size on model performance was also evaluated.

Both models achieved consistent and high accuracy, precision, and recall (all >90%) across the 5 seed runs for both report types. For echocardiography, BioclinicalBERT reached mean accuracy of 95.7%, precision of 97.6%, recall of 97.4%, and F1-score of 0.98 at the probability threshold of 0.1; BART-Large-CNN had similar results. For cardiac catheterization, BART-Large-CNN slightly outperformed BioclinicalBERT with mean accuracy 94.9% vs 94.3%; precision 96.7% vs 96.3%; recall 96.1% vs 95.7%, and F1-score 0.96 vs 0.96 at the probability threshold of 0.1. Most individual categories showed strong performance, though a few (eg, prosthetic mitral valve, right atrial pressure) had lower scores. Performance improved with more training data, but plateauing around 1000 reports.

Fine-tuned transformer-based LMs can effectively extract structured data from unstructured cardiac reports, supporting automated information extraction to enhance research and clinical applications.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13020537/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13020537/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC13020537/full.md

---
Source: https://tomesphere.com/paper/PMC13020537