# A multi-query, multimodal, receiver-augmented solution to extract contemporary cardiology guideline information using large language models

**Authors:** Robert M Radke, Gerhard-Paul Diller, Rohan G Reddy, Pushpa Shivaram, David A Danford, Shelby Kutty

PMC · DOI: 10.1093/ehjdh/ztaf111 · 2025-09-23

## TL;DR

This paper introduces a new system using large language models to provide accurate, transparent cardiology guidelines for clinicians, outperforming existing models like GPT-3.5 and GPT-4.

## Contribution

A novel multi-query, multimodal, receiver-augmented system that improves guideline-based cardiology recommendations with higher accuracy and transparency.

## Key findings

- The system achieved 73.53% accuracy on a 306-question cardiology exam, outperforming GPT-3.5 and GPT-4.
- The system outperformed other models in multiple cardiology categories including coronary artery disease and arrhythmia.
- The system provided traceable and documented recommendations based on up-to-date clinical guidelines.

## Abstract

The aim of the current study was to assess the utility of a state-of-the-art large language model (LLM) based on curated, defined clinical practice recommendations to support clinicians in obtaining point-of-care guidelines for individual patient treatment while maintaining transparency.

We combined cloud-based and locally run LLMs with versatile open-source tools to form a multi-query, multimodal, retrieval-augmented generation chain that closely reflects European cardiology guidelines in its responses. We compared the performance of this generation chain to other LLMs including GPT-3.5 and GPT-4 on a 306-question multiple-choice exam with questions consisting of short patient vignettes from various cardiology subspecialties, originally written to prepare candidates for the European Exam in Core Cardiology. On the multiple-choice test, our system demonstrated overall accuracy of 73.53%, while GPT-3.5 and GPT-4 had overall accuracies of 44.03 and 62.26%, respectively. Our system outperformed GPT-3.5 and GPT-4 for the following categories of questions: coronary artery disease, arrhythmia, other, valvular heart disease, cardiomyopathies, endocarditis, adult congenital heart disease, pericardial disease, cardio-oncology, pulmonary hypertension, and non-cardiac surgery. For maximum transparency, the system also provided reference quotes for its recommendations.

Our system demonstrated superior performance in question-answering tasks on a set of core cardiology questions as compared with contemporary publicly available chat models. The current study illustrates that LLMs can be tailored to provide documented and accountable guideline recommendations towards actual clinical needs while ensuring recommendations are derived from up-to-date, trustable, and traceable documents.

Graphical Abstract

## Linked entities

- **Diseases:** coronary artery disease (MONDO:0005010), arrhythmia (MONDO:0007263), cardiomyopathies (MONDO:0004994), endocarditis (MONDO:0005025), pulmonary hypertension (MONDO:0005149)

## Full-text entities

- **Diseases:** valvular heart disease (MESH:D006349), arrhythmia (MESH:D001145), pericardial disease (MESH:D008476), pulmonary hypertension (MESH:D006976), endocarditis (MESH:D004696), congenital heart disease (MESH:D006330), cardiomyopathies (MESH:D009202), coronary artery disease (MESH:D003324)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12629642/full.md

---
Source: https://tomesphere.com/paper/PMC12629642