A vision-language model and platform for temporally mapping surgery from video

Dani Kiyasseh

arXiv:2603.22583·cs.CV·March 25, 2026

A vision-language model and platform for temporally mapping surgery from video

Dani Kiyasseh

PDF

Open Access

TL;DR

This paper introduces Halsted, a comprehensive vision-language model trained on a large surgical video dataset, enabling automatic, accurate mapping of surgical procedures to support clinical use and autonomous robotic surgery.

Contribution

The paper presents Halsted, a novel, scalable vision-language model trained on the extensive Halsted Surgical Atlas, with a publicly released subset for benchmarking, advancing surgical AI capabilities.

Findings

01

Halsted outperforms previous models in surgical activity mapping.

02

The model offers greater comprehensiveness and efficiency.

03

The Halsted platform enables surgeons to map procedures within minutes.

Abstract

Mapping surgery is fundamental to developing operative guidelines and enabling autonomous robotic surgery. Recent advances in artificial intelligence (AI) have shown promise in mapping the behaviour of surgeons from videos, yet current models remain narrow in scope, capturing limited behavioural components within single procedures, and offer limited translational value, as they remain inaccessible to practising surgeons. Here we introduce Halsted, a vision-language model trained on the Halsted Surgical Atlas (HSA), one of the most comprehensive annotated video libraries grown through an iterative self-labelling framework and encompassing over 650,000 videos across eight surgical specialties. To facilitate benchmarking, we publicly release HSA-27k, a subset of the Halsted Surgical Atlas. Halsted surpasses previous state-of-the-art models in mapping surgical activity while offering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Multimodal Machine Learning Applications · Soft Robotics and Applications