Adherence of Free-Tier Large Language Models to the 2024 European Society of Cardiology (ESC) Guidelines for the Management of Elevated Blood Pressure and Hypertension: A Comparative Study

Aleksander Polus; Dawid Boczkowski; Rania Suleiman; Bartosz Palacz; Natalia Marianna Kubis; Julia Anna Wrona; Wiktor Perz; Maria Magdalena Teper; Anhelina Korolchuk; Jedrzej Piotrowski; Anna Gluzicka; Anna Matyas; Aleksander Tuteja; Piotr Sawina; Aleksandra Wielochowska

PMC · DOI:10.7759/cureus.104111·February 23, 2026

Adherence of Free-Tier Large Language Models to the 2024 European Society of Cardiology (ESC) Guidelines for the Management of Elevated Blood Pressure and Hypertension: A Comparative Study

Aleksander Polus, Dawid Boczkowski, Rania Suleiman, Bartosz Palacz, Natalia Marianna Kubis, Julia Anna Wrona, Wiktor Perz, Maria Magdalena Teper, Anhelina Korolchuk, Jedrzej Piotrowski, Anna Gluzicka, Anna Matyas, Aleksander Tuteja, Piotr Sawina, Aleksandra Wielochowska

PDF

Open Access

TL;DR

This study compares how well free large language models follow new 2024 European Society of Cardiology guidelines for managing high blood pressure.

Contribution

First comparative analysis of free-tier LLMs' adherence to the 2024 ESC hypertension guidelines using physician-verified questions.

Findings

01

All three LLMs showed high accuracy with no significant differences in guideline adherence.

02

Claude 4.5 Sonnet had the highest accuracy at 82.5%.

03

Models exhibited a tendency toward overly aggressive clinical recommendations.

Abstract

Background Hypertension remains the leading modifiable risk factor for cardiovascular disease and premature death worldwide. In 2024, the European Society of Cardiology (ESC) released updated guidelines for the management of elevated blood pressure and hypertension. Concurrently, the integration of artificial intelligence into healthcare has accelerated, with large language models (LLMs) becoming accessible tools for information retrieval. Objective This study aims to evaluate and compare the accuracy and adherence of three popular free-tier LLMs (ChatGPT-5.2, Gemini 3 Flash, and Claude 4.5 Sonnet) in responding to questions based strictly on the 2024 ESC Guidelines. Methods We conducted a comparative cross-sectional study in January 2026 to evaluate the performance of three LLMs. The primary source of ground truth was the 2024 ESC Guidelines. A dataset of 40 specific questions was…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases3

premature death cardiovascular disease Elevated Blood Pressure

Figures1

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Electronic Health Records Systems