Text-driven Adaptation of Foundation Models for Few-shot Surgical Workflow Analysis
Tingxuan Chen, Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy

TL;DR
This paper introduces Surg-FTDA, a novel method that enables surgical workflow analysis with minimal annotated data by aligning image and text embeddings and leveraging text data for adaptation, improving performance across tasks.
Contribution
The paper presents a new text-driven adaptation framework for surgical workflow analysis that reduces reliance on large annotated datasets and handles multiple tasks effectively.
Findings
Outperforms baseline methods in generative and discriminative tasks
Generalizes well across different surgical workflow analysis tasks
Requires minimal paired image-label data for effective adaptation
Abstract
Purpose: Surgical workflow analysis is crucial for improving surgical efficiency and safety. However, previous studies rely heavily on large-scale annotated datasets, posing challenges in cost, scalability, and reliance on expert annotations. To address this, we propose Surg-FTDA (Few-shot Text-driven Adaptation), designed to handle various surgical workflow analysis tasks with minimal paired image-label data. Methods: Our approach has two key components. First, Few-shot selection-based modality alignment selects a small subset of images and aligns their embeddings with text embeddings from the downstream task, bridging the modality gap. Second, Text-driven adaptation leverages only text data to train a decoder, eliminating the need for paired image-text data. This decoder is then applied to aligned image embeddings, enabling image-related tasks without explicit image-text pairs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Medical Imaging and Analysis
