VLM-KG: Multimodal Radiology Knowledge Graph Generation
Abdullah Abdullah, Seong Tae Kim

TL;DR
This paper introduces VLM-KG, a multimodal vision-language model that generates radiology knowledge graphs by integrating radiology reports and images, overcoming limitations of unimodal approaches and handling long-form data.
Contribution
It presents the first multimodal framework for radiology knowledge graph generation, improving over previous unimodal methods and addressing challenges with specialized language and data length.
Findings
Outperforms previous unimodal methods
First multimodal solution for radiology knowledge graphs
Effectively handles long-form radiology data
Abstract
Vision-Language Models (VLMs) have demonstrated remarkable success in natural language generation, excelling at instruction following and structured output generation. Knowledge graphs play a crucial role in radiology, serving as valuable sources of factual information and enhancing various downstream tasks. However, generating radiology-specific knowledge graphs presents significant challenges due to the specialized language of radiology reports and the limited availability of domain-specific data. Existing solutions are predominantly unimodal, meaning they generate knowledge graphs only from radiology reports while excluding radiographic images. Additionally, they struggle with long-form radiology data due to limited context length. To address these limitations, we propose a novel multimodal VLM-based framework for knowledge graph generation in radiology. Our approach outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
