Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision
Lennart Maack, Julia-Kristin Gra{\ss}, Lisa-Marie Toscha, Nathaniel Melling, Alexander Schlaefer

TL;DR
This paper introduces a privacy-preserving method to distill expert surgical knowledge into local vision language models, enhancing their ability to understand and explain anatomy during complex surgeries like Complete Mesocolic Excision.
Contribution
The authors propose a novel framework for training local surgical VLMs using expert-supervised datasets generated without sensitive images, improving domain-specific understanding.
Findings
Finetuning VLMs with generated datasets significantly improves surgical scene understanding.
The approach maintains privacy by avoiding sensitive image data during training.
The method is data-efficient and suitable for local deployment in clinical settings.
Abstract
Recently, Vision Large Language Models (VLMs) have demonstrated high potential in computer-aided diagnosis and decision-support. However, current VLMs show deficits in domain specific surgical scene understanding, such as identifying and explaining anatomical landmarks during Complete Mesocolic Excision. Additionally, there is a need for locally deployable models to avoid patient data leakage to large VLMs, hosted outside the clinic. We propose a privacy-preserving framework to distill knowledge from large, general-purpose LLMs into an efficient, local VLM. We generate an expert-supervised dataset by prompting a teacher LLM without sensitive images, using only textual context and binary segmentation masks for spatial information. This dataset is used for Supervised Fine-Tuning (SFT) and subsequent Direct Preference Optimization (DPO) of the locally deployable VLM. Our evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications · Surgical Simulation and Training
