SHANDS: A Multi-View Dataset and Benchmark for Surgical Hand-Gesture and Error Recognition Toward Medical Training
Le Ma, Thiago Freitas dos Santos, Nadia Magnenat-Thalmann, Katarzyna Wac

TL;DR
This paper introduces Surgical-Hands, a comprehensive multi-view dataset for surgical hand-gesture and error recognition, aimed at improving AI-based assessments in medical training.
Contribution
It provides a large-scale, multi-view video dataset with detailed annotations and evaluation protocols, facilitating robust AI development for surgical training assessment.
Findings
Benchmarking shows current models achieve moderate accuracy on the dataset.
Multi-view approaches outperform single-view models in gesture and error recognition.
The dataset enables evaluation of cross-view generalization for surgical skill assessment.
Abstract
In surgical training for medical students, proficiency development relies on expert-led skill assessment, which is costly, time-limited, difficult to scale, and its expertise remains confined to institutions with available specialists. Automated AI-based assessment offers a viable alternative, but progress is constrained by the lack of datasets containing realistic trainee errors and the multi-view variability needed to train robust computer vision approaches. To address this gap, we present Surgical-Hands (SHands), a large-scale multi-view video dataset for surgical hand-gesture and error recognition for medical training. \textsc{SHands} captures linear incision and suturing using five RGB cameras from complementary viewpoints, performed by 52 participants (20 experts and 32 trainees), each completing three standardized trials per procedure. The videos are annotated at the frame level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
