GP-VLS: A general-purpose vision language model for surgery

Samuel Schmidgall; Joseph Cho; Cyril Zakka; William Hiesinger

arXiv:2407.19305·cs.CV·August 8, 2024·3 cites

GP-VLS: A general-purpose vision language model for surgery

Samuel Schmidgall, Joseph Cho, Cyril Zakka, William Hiesinger

PDF

Open Access

TL;DR

GP-VLS is a versatile vision-language model for surgery that integrates medical knowledge and visual understanding, enabling broad surgical AI applications and outperforming existing models on multiple benchmarks.

Contribution

Introduction of GP-VLS, a general-purpose surgical vision-language model trained on new datasets, with a novel evaluation benchmark SurgiQual, advancing surgical AI capabilities.

Findings

01

GP-VLS outperforms existing models by 8-21% on surgical benchmarks.

02

GP-VLS demonstrates strong performance on medical and surgical knowledge tests.

03

The model is open-source, facilitating further research and development.

Abstract

Surgery requires comprehensive medical knowledge, visual assessment skills, and procedural expertise. While recent surgical AI models have focused on solving task-specific problems, there is a need for general-purpose systems that can understand surgical scenes and interact through natural language. This paper introduces GP-VLS, a general-purpose vision language model for surgery that integrates medical and surgical knowledge with visual scene understanding. For comprehensively evaluating general-purpose surgical models, we propose SurgiQual, which evaluates across medical and surgical knowledge benchmarks as well as surgical vision-language questions. To train GP-VLS, we develop six new datasets spanning medical knowledge, surgical textbooks, and vision-language pairs for tasks like phase recognition and tool identification. We show that GP-VLS significantly outperforms existing open-…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques