YOLOv10-Based Multi-Task Framework for Hand Localization and Laterality Classification in Surgical Videos
Kedi Sun, Le Zhang

TL;DR
This paper introduces a YOLOv10-based multi-task framework for real-time hand localization and laterality classification in surgical videos, aiming to enhance intraoperative decision support.
Contribution
It presents a novel multi-task detection model trained on surgical data, improving robustness and enabling simultaneous hand detection and laterality classification.
Findings
Achieves 67 ext{ }and 71 ext{ }accuracy for left and right hand classification
Reaches an mAP of 0.33 on surgical videos
Operates in real-time for intraoperative use
Abstract
Real-time hand tracking in trauma surgery is essential for supporting rapid and precise intraoperative decisions. We propose a YOLOv10-based framework that simultaneously localizes hands and classifies their laterality (left or right) in complex surgical scenes. The model is trained on the Trauma THOMPSON Challenge 2025 Task 2 dataset, consisting of first-person surgical videos with annotated hand bounding boxes. Extensive data augmentation and a multi-task detection design improve robustness against motion blur, lighting variations, and diverse hand appearances. Evaluation demonstrates accurate left-hand (67\%) and right-hand (71\%) classification, while distinguishing hands from the background remains challenging. The model achieves an of 0.33 and maintains real-time inference, highlighting its potential for intraoperative deployment. This work establishes a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Surgical Simulation and Training
