MedG-KRP: Medical Graph Knowledge Representation Probing

Gabriel R. Rosenbaum; Lavender Yao Jiang; Ivaxi Sheth; Jaden Stryker,; Anton Alyakin; Daniel Alexander Alber; Nicolas K. Goff; Young Joon Fred Kwon,; John Markert; Mustafa Nasir-Moin; Jan Moritz Niehues; Karl L. Sangwon; Eunice; Yang; and Eric Karl Oermann

arXiv:2412.10982·cs.AI·December 18, 2024

MedG-KRP: Medical Graph Knowledge Representation Probing

Gabriel R. Rosenbaum, Lavender Yao Jiang, Ivaxi Sheth, Jaden Stryker,, Anton Alyakin, Daniel Alexander Alber, Nicolas K. Goff, Young Joon Fred Kwon,, John Markert, Mustafa Nasir-Moin, Jan Moritz Niehues, Karl L. Sangwon, Eunice, Yang, and Eric Karl Oermann

PDF

Open Access 1 Repo

TL;DR

This paper introduces a knowledge graph-based method to evaluate and visualize the biomedical reasoning abilities of large language models, aiming to improve their reliability for clinical applications.

Contribution

It presents a novel approach using medical knowledge graphs to assess and interpret LLMs' reasoning in biomedical contexts, addressing limitations of traditional benchmarks.

Findings

01

GPT-4 performs best in human review but worst in ground truth comparison.

02

PalmyraMed-70b shows the opposite pattern.

03

The method enables visualization of LLMs' reasoning pathways in medicine.

Abstract

Large language models (LLMs) have recently emerged as powerful tools, finding many medical applications. LLMs' ability to coalesce vast amounts of information from many sources to generate a response-a process similar to that of a human expert-has led many to see potential in deploying LLMs for clinical use. However, medicine is a setting where accurate reasoning is paramount. Many researchers are questioning the effectiveness of multiple choice question answering (MCQA) benchmarks, frequently used to test LLMs. Researchers and clinicians alike must have complete confidence in LLMs' abilities for them to be deployed in a medical setting. To address this need for understanding, we introduce a knowledge graph (KG)-based method to evaluate the biomedical reasoning abilities of LLMs. Essentially, we map how LLMs link medical concepts in order to better understand how they reason. We test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nyuolab/medg-krp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Semantic Web and Ontologies

MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax