Evaluating Computational Accuracy of Large Language Models in Numerical   Reasoning Tasks for Healthcare Applications

Arjun R. Malghan

arXiv:2501.13936·cs.AI·January 27, 2025

Evaluating Computational Accuracy of Large Language Models in Numerical Reasoning Tasks for Healthcare Applications

Arjun R. Malghan

PDF

Open Access

TL;DR

This paper evaluates the numerical reasoning accuracy of GPT-3 based large language models in healthcare, highlighting their strengths and challenges in clinical numerical tasks, and proposes methods to improve reliability.

Contribution

It introduces a comprehensive evaluation of LLMs in healthcare numerical reasoning and demonstrates the effectiveness of prompt engineering and fact-checking pipelines.

Findings

01

Overall accuracy of 84.10% in numerical tasks

02

Fact-checking pipeline improved accuracy by 11%

03

Better performance in simple tasks than multi-step reasoning

Abstract

Large Language Models (LLMs) have emerged as transformative tools in the healthcare sector, demonstrating remarkable capabilities in natural language understanding and generation. However, their proficiency in numerical reasoning, particularly in high-stakes domains like in clinical applications, remains underexplored. Numerical reasoning is critical in healthcare applications, influencing patient outcomes, treatment planning, and resource allocation. This study investigates the computational accuracy of LLMs in numerical reasoning tasks within healthcare contexts. Using a curated dataset of 1,000 numerical problems, encompassing real-world scenarios such as dosage calculations and lab result interpretations, the performance of a refined LLM based on the GPT-3 architecture was evaluated. The methodology includes prompt engineering, integration of fact-checking pipelines, and application…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare