YOLOv10-Based Multi-Task Framework for Hand Localization and Laterality Classification in Surgical Videos

Kedi Sun; Le Zhang

arXiv:2602.18959·cs.CV·February 24, 2026

YOLOv10-Based Multi-Task Framework for Hand Localization and Laterality Classification in Surgical Videos

Kedi Sun, Le Zhang

PDF

Open Access

TL;DR

This paper introduces a YOLOv10-based multi-task framework for real-time hand localization and laterality classification in surgical videos, aiming to enhance intraoperative decision support.

Contribution

It presents a novel multi-task detection model trained on surgical data, improving robustness and enabling simultaneous hand detection and laterality classification.

Findings

01

Achieves 67 ext{ }and 71 ext{ }accuracy for left and right hand classification

02

Reaches an mAP of 0.33 on surgical videos

03

Operates in real-time for intraoperative use

Abstract

Real-time hand tracking in trauma surgery is essential for supporting rapid and precise intraoperative decisions. We propose a YOLOv10-based framework that simultaneously localizes hands and classifies their laterality (left or right) in complex surgical scenes. The model is trained on the Trauma THOMPSON Challenge 2025 Task 2 dataset, consisting of first-person surgical videos with annotated hand bounding boxes. Extensive data augmentation and a multi-task detection design improve robustness against motion blur, lighting variations, and diverse hand appearances. Evaluation demonstrates accurate left-hand (67\%) and right-hand (71\%) classification, while distinguishing hands from the background remains challenging. The model achieves an $m A P_{[0.5 : 0.95]}$ of 0.33 and maintains real-time inference, highlighting its potential for intraoperative deployment. This work establishes a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Surgical Simulation and Training