Deep Multimodal Fusion for Surgical Feedback Classification
Rafal Kocielnik, Elyssa Y. Wong, Timothy N. Chu, Lydia Lin, De-An, Huang, Jiayun Wang, Anima Anandkumar, Andrew J. Hung

TL;DR
This paper develops a multimodal machine learning model to classify real-time surgical feedback into five categories using text, audio, and video, aiming to automate feedback annotation during surgeries.
Contribution
It introduces a multi-label classification approach leveraging multimodal data and a staged training strategy for real-time surgical feedback analysis.
Findings
Fusion improves classification performance by 3.1%.
Manual transcriptions significantly boost accuracy.
Staged training outperforms joint training methods.
Abstract
Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In this work, we leverage a clinically-validated five-category classification of surgical feedback: "Anatomic", "Technical", "Procedural", "Praise" and "Visual Aid". We then develop a multi-label machine learning model to classify these five categories of surgical feedback from inputs of text, audio, and video modalities. The ultimate goal of our work is to help automate the annotation of real-time contextual surgical feedback at scale. Our automated classification of surgical feedback achieves AUCs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCardiac, Anesthesia and Surgical Outcomes · Surgical Simulation and Training · Hospital Admissions and Outcomes
