Surgical-LLaVA: Toward Surgical Scenario Understanding via Large   Language and Vision Models

Juseong Jin; Chang Wook Jeong

arXiv:2410.09750·cs.CV·October 15, 2024·2 cites

Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models

Juseong Jin, Chang Wook Jeong

PDF

Open Access

TL;DR

Surgical-LLaVA is a specialized large vision-language model designed to understand and interact with surgical images and videos, enhancing multi-modal communication in surgical contexts.

Contribution

The paper introduces Surgical-LLaVA, a novel LVLM tailored for surgical scenarios, integrating visual data into language models and fine-tuning on surgical instruction data.

Findings

01

Demonstrates impressive multi-modal chat abilities in surgical contexts

02

Achieves superior performance on surgical visual question-answering datasets

03

Displays potential for handling complex surgical scenarios

Abstract

Conversation agents powered by large language models are revolutionizing the way we interact with visual data. Recently, large vision-language models (LVLMs) have been extensively studied for both images and videos. However, these studies typically focus on common scenarios. In this work, we introduce an LVLM specifically designed for surgical scenarios. We integrate visual representations of surgical images and videos into the language feature space. Consequently, we establish a LVLM model, Surgical-LLaVA, fine-tuned on instruction following data of surgical scenarios. Our experiments demonstrate that Surgical-LLaVA exhibits impressive multi-modal chat abilities in surgical contexts, occasionally displaying multi-modal behaviors on unseen instructions. We conduct a quantitative evaluation of visual question-answering datasets for surgical scenarios. The results show superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · AI in cancer detection

MethodsFocus