Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
Britty Baby, Vinkle Srivastav, Pooja P. Jain, Kun Yuan, Pietro Mascagni, Nicolas Padoy

TL;DR
This paper introduces CVS-AdaptNet, a multi-modal, multi-label model that uses textual prompts to improve the recognition of critical safety criteria in laparoscopic surgery, outperforming image-only baselines.
Contribution
It presents CVS-AdaptNet, a novel multi-label, multi-modal adaptation strategy for surgical safety recognition, leveraging textual prompts to enhance model performance.
Findings
CVS-AdaptNet achieves 57.6 mAP, surpassing the image-only baseline by 6 points.
Textual prompts improve multi-label classification accuracy in surgical safety recognition.
Multi-modal approach enhances the potential of generalist models for specialized surgical tasks.
Abstract
The Critical View of Safety (CVS) is crucial for safe laparoscopic cholecystectomy, yet assessing CVS criteria remains a complex and challenging task, even for experts. Traditional models for CVS recognition depend on vision-only models learning with costly, labor-intensive spatial annotations. This study investigates how text can be harnessed as a powerful tool for both training and inference in multi-modal surgical foundation models to automate CVS recognition. Unlike many existing multi-modal models, which are primarily adapted for multi-class classification, CVS recognition requires a multi-label framework. Zero-shot evaluation of existing multi-modal surgical models shows a significant performance gap for this task. To address this, we propose CVS-AdaptNet, a multi-label adaptation strategy that enhances fine-grained, binary classification across multiple labels by aligning image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
