Can Large Language Models Reliably Extract Physiology Index Values from Coronary Angiography Reports?
Sofia Morgado, Filipa Valdeira, Niklas Sander, Diogo Ferreira, Marta Vilela, Miguel Menezes, Cl\'audia Soares

TL;DR
This study evaluates the effectiveness of various Large Language Models in extracting physiological measurements from Portuguese coronary angiography reports, highlighting the potential and limitations of current models.
Contribution
First large-scale investigation of LLMs for extracting physiology indexes from Portuguese CAG reports, exploring different prompting strategies and evaluation methods.
Findings
Llama with zero-shot prompting achieved the best results.
GPT-OSS showed high robustness to prompt variations.
Constrained generation slightly decreased performance but enabled template adherence.
Abstract
Coronary angiography (CAG) reports contain clinically relevant physiological measurements, yet this information is typically in the form of unstructured natural language, limiting its use in research. We investigate the use of Large Language Models (LLMs) to automatically extract these values, along with their anatomical locations, from Portuguese CAG reports. To our knowledge, this study is the first addressing physiology indexes extraction from a large (1342 reports) corpus of CAG reports, and one of the few focusing on CAG or Portuguese clinical text. We explore local privacy-preserving general-purpose and medical LLMs under different settings. Prompting strategies included zero-shot, few-shot, and few-shot prompting with implausible examples. In addition, we apply constrained generation and introduce a post-processing step based on RegEx. Given the sparsity of measurements, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
