VLM-KG: Multimodal Radiology Knowledge Graph Generation

Abdullah Abdullah; Seong Tae Kim

arXiv:2505.17042·cs.CL·May 26, 2025

VLM-KG: Multimodal Radiology Knowledge Graph Generation

Abdullah Abdullah, Seong Tae Kim

PDF

TL;DR

This paper introduces VLM-KG, a multimodal vision-language model that generates radiology knowledge graphs by integrating radiology reports and images, overcoming limitations of unimodal approaches and handling long-form data.

Contribution

It presents the first multimodal framework for radiology knowledge graph generation, improving over previous unimodal methods and addressing challenges with specialized language and data length.

Findings

01

Outperforms previous unimodal methods

02

First multimodal solution for radiology knowledge graphs

03

Effectively handles long-form radiology data

Abstract

Vision-Language Models (VLMs) have demonstrated remarkable success in natural language generation, excelling at instruction following and structured output generation. Knowledge graphs play a crucial role in radiology, serving as valuable sources of factual information and enhancing various downstream tasks. However, generating radiology-specific knowledge graphs presents significant challenges due to the specialized language of radiology reports and the limited availability of domain-specific data. Existing solutions are predominantly unimodal, meaning they generate knowledge graphs only from radiology reports while excluding radiographic images. Additionally, they struggle with long-form radiology data due to limited context length. To address these limitations, we propose a novel multimodal VLM-based framework for knowledge graph generation in radiology. Our approach outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.