Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision

Soham Walimbe; Britty Baby; Vinkle Srivastav; Nicolas Padoy

arXiv:2507.05020·cs.CV·July 11, 2025

Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision

Soham Walimbe, Britty Baby, Vinkle Srivastav, Nicolas Padoy

PDF

Open Access

TL;DR

This paper introduces MML-SurgAdapt, a multi-task surgical computer vision framework using vision-language models and single positive multi-label learning to handle diverse tasks with incomplete annotations, reducing labeling effort and improving scalability.

Contribution

It presents the first application of SPML to multi-task surgical data, integrating multiple tasks with noisy labels using a unified VLM-based model.

Findings

01

Achieves comparable performance to task-specific models

02

Reduces annotation effort by 23%

03

Outperforms existing SPML frameworks in surgical tasks

Abstract

Surgical AI often involves multiple tasks within a single procedure, like phase recognition or assessing the Critical View of Safety in laparoscopic cholecystectomy. Traditional models, built for one task at a time, lack flexibility, requiring a separate model for each. To address this, we introduce MML-SurgAdapt, a unified multi-task framework with Vision-Language Models (VLMs), specifically CLIP, to handle diverse surgical tasks through natural language supervision. A key challenge in multi-task learning is the presence of partial annotations when integrating different tasks. To overcome this, we employ Single Positive Multi-Label (SPML) learning, which traditionally reduces annotation burden by training models with only one positive label per instance. Our framework extends this approach to integrate data from multiple surgical tasks within a single procedure, enabling effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis · Surgical Simulation and Training · Advanced X-ray and CT Imaging