D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities   of Large Language Models

Duygu Altinok

arXiv:2405.04170·cs.CL·May 8, 2024

D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models

Duygu Altinok

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the inference capabilities of various large language models in the medical domain, specifically using clinical trial reports to assess their accuracy, reasoning, and handling of medical abbreviations.

Contribution

It provides the first comprehensive analysis of LLMs' natural language inference performance in the medical domain using clinical data.

Findings

01

Gemini achieved a test F1-score of 0.748

02

LLMs show challenges with medical abbreviations and numerical reasoning

03

First thorough evaluation of LLM inference in medical context

Abstract

Large language models (LLMs) have garnered significant attention and widespread usage due to their impressive performance in various tasks. However, they are not without their own set of challenges, including issues such as hallucinations, factual inconsistencies, and limitations in numerical-quantitative reasoning. Evaluating LLMs in miscellaneous reasoning tasks remains an active area of research. Prior to the breakthrough of LLMs, Transformers had already proven successful in the medical domain, effectively employed for various natural language understanding (NLU) tasks. Following this trend, LLMs have also been trained and utilized in the medical domain, raising concerns regarding factual accuracy, adherence to safety protocols, and inherent limitations. In this paper, we focus on evaluating the natural language inference capabilities of popular open-source and closed-source LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

duygua/semeval2024_nli4ct
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies

MethodsSparse Evolutionary Training · Focus