ECG-LLM -- training and evaluation of domain-specific large language models for electrocardiography

Lara Ahrens; Wilhelm Haverkamp; Nils Strodthoff

arXiv:2510.18339·cs.CL·October 22, 2025

ECG-LLM -- training and evaluation of domain-specific large language models for electrocardiography

Lara Ahrens, Wilhelm Haverkamp, Nils Strodthoff

PDF

Open Access

TL;DR

This study explores how domain-specific large language models, trained on electrocardiography literature, perform in clinical tasks, demonstrating that finetuning and retrieval-augmented methods can rival proprietary models while preserving privacy.

Contribution

It provides a comprehensive evaluation of domain-specific LLMs for electrocardiography, highlighting effective adaptation strategies and assessment complexities.

Findings

01

Finetuned Llama 3.1 70B outperforms base models on multiple metrics.

02

Claude 3.7 and RAG excel in complex query evaluations.

03

Domain-specific models achieve competitive performance with proprietary solutions.

Abstract

Domain-adapted open-weight large language models (LLMs) offer promising healthcare applications, from queryable knowledge bases to multimodal assistants, with the crucial advantage of local deployment for privacy preservation. However, optimal adaptation strategies, evaluation methodologies, and performance relative to general-purpose LLMs remain poorly characterized. We investigated these questions in electrocardiography, an important area of cardiovascular medicine, by finetuning open-weight models on domain-specific literature and implementing a multi-layered evaluation framework comparing finetuned models, retrieval-augmented generation (RAG), and Claude Sonnet 3.7 as a representative general-purpose model. Finetuned Llama 3.1 70B achieved superior performance on multiple-choice evaluations and automatic text metrics, ranking second to Claude 3.7 in LLM-as-a-judge assessments. Human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Artificial Intelligence in Healthcare and Education