COGNET-MD, an evaluation framework and dataset for Large Language Model   benchmarks in the medical domain

Dimitrios P. Panagoulias; Persephone Papatheodosiou; Anastasios; P. Palamidas; Mattheos Sanoudos; Evridiki Tsoureli-Nikita; Maria; Virvou; George A. Tsihrintzis

arXiv:2405.10893·cs.CL·May 20, 2024·2 cites

COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain

Dimitrios P. Panagoulias, Persephone Papatheodosiou, Anastasios, P. Palamidas, Mattheos Sanoudos, Evridiki Tsoureli-Nikita, Maria, Virvou, George A. Tsihrintzis

PDF

Open Access 1 Datasets

TL;DR

This paper introduces COGNET-MD, a new evaluation framework and dataset for benchmarking Large Language Models in the medical domain, focusing on interpretative ability and safety through a challenging scoring system and expert-constructed MCQs.

Contribution

It presents a novel, domain-specific benchmark with a scoring framework and a curated MCQ dataset for assessing LLMs in medical contexts, including multiple specialties.

Findings

01

Benchmark includes diverse medical domains.

02

MCQ dataset constructed with medical experts.

03

Framework emphasizes interpretative difficulty and safety.

Abstract

Large Language Models (LLMs) constitute a breakthrough state-of-the-art Artificial Intelligence (AI) technology which is rapidly evolving and promises to aid in medical diagnosis either by assisting doctors or by simulating a doctor's workflow in more advanced and complex implementations. In this technical paper, we outline Cognitive Network Evaluation Toolkit for Medical Domains (COGNET-MD), which constitutes a novel benchmark for LLM evaluation in the medical domain. Specifically, we propose a scoring-framework with increased difficulty to assess the ability of LLMs in interpreting medical text. The proposed framework is accompanied with a database of Multiple Choice Quizzes (MCQs). To ensure alignment with current medical trends and enhance safety, usefulness, and applicability, these MCQs have been constructed in collaboration with several associated medical experts in various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

DimitriosPanagoulias/COGNET-MD
dataset· 13 dl
13 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging