Adherence of Free-Tier Large Language Models to the 2024 European Society of Cardiology (ESC) Guidelines for the Management of Elevated Blood Pressure and Hypertension: A Comparative Study
Aleksander Polus, Dawid Boczkowski, Rania Suleiman, Bartosz Palacz, Natalia Marianna Kubis, Julia Anna Wrona, Wiktor Perz, Maria Magdalena Teper, Anhelina Korolchuk, Jedrzej Piotrowski, Anna Gluzicka, Anna Matyas, Aleksander Tuteja, Piotr Sawina, Aleksandra Wielochowska

TL;DR
This study compares how well free large language models follow new 2024 European Society of Cardiology guidelines for managing high blood pressure.
Contribution
First comparative analysis of free-tier LLMs' adherence to the 2024 ESC hypertension guidelines using physician-verified questions.
Findings
All three LLMs showed high accuracy with no significant differences in guideline adherence.
Claude 4.5 Sonnet had the highest accuracy at 82.5%.
Models exhibited a tendency toward overly aggressive clinical recommendations.
Abstract
Background Hypertension remains the leading modifiable risk factor for cardiovascular disease and premature death worldwide. In 2024, the European Society of Cardiology (ESC) released updated guidelines for the management of elevated blood pressure and hypertension. Concurrently, the integration of artificial intelligence into healthcare has accelerated, with large language models (LLMs) becoming accessible tools for information retrieval. Objective This study aims to evaluate and compare the accuracy and adherence of three popular free-tier LLMs (ChatGPT-5.2, Gemini 3 Flash, and Claude 4.5 Sonnet) in responding to questions based strictly on the 2024 ESC Guidelines. Methods We conducted a comparative cross-sectional study in January 2026 to evaluate the performance of three LLMs. The primary source of ground truth was the 2024 ESC Guidelines. A dataset of 40 specific questions was…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Electronic Health Records Systems
