A vision-language model and platform for temporally mapping surgery from video
Dani Kiyasseh

TL;DR
This paper introduces Halsted, a comprehensive vision-language model trained on a large surgical video dataset, enabling automatic, accurate mapping of surgical procedures to support clinical use and autonomous robotic surgery.
Contribution
The paper presents Halsted, a novel, scalable vision-language model trained on the extensive Halsted Surgical Atlas, with a publicly released subset for benchmarking, advancing surgical AI capabilities.
Findings
Halsted outperforms previous models in surgical activity mapping.
The model offers greater comprehensiveness and efficiency.
The Halsted platform enables surgeons to map procedures within minutes.
Abstract
Mapping surgery is fundamental to developing operative guidelines and enabling autonomous robotic surgery. Recent advances in artificial intelligence (AI) have shown promise in mapping the behaviour of surgeons from videos, yet current models remain narrow in scope, capturing limited behavioural components within single procedures, and offer limited translational value, as they remain inaccessible to practising surgeons. Here we introduce Halsted, a vision-language model trained on the Halsted Surgical Atlas (HSA), one of the most comprehensive annotated video libraries grown through an iterative self-labelling framework and encompassing over 650,000 videos across eight surgical specialties. To facilitate benchmarking, we publicly release HSA-27k, a subset of the Halsted Surgical Atlas. Halsted surpasses previous state-of-the-art models in mapping surgical activity while offering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Multimodal Machine Learning Applications · Soft Robotics and Applications
