Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment
Xueying Shi, Yueming Jin, Qi Dou, Jing Qin, and Pheng-Ann Heng

TL;DR
This paper introduces an unsupervised domain adaptation framework for robotic gesture recognition that aligns multi-modal data from simulators to real robots, significantly improving accuracy without requiring real robot annotations.
Contribution
It presents a novel multi-modal domain adaptation method that aligns kinematic and visual data using temporal cues and correlation-based features, enhancing transferability in surgical gesture recognition.
Findings
Achieves up to 12.91% accuracy improvement
Improves F1 score by 20.16%
Effectively transfers knowledge from simulator to real robot
Abstract
Automated surgical gesture recognition is of great importance in robot-assisted minimally invasive surgery. However, existing methods assume that training and testing data are from the same domain, which suffers from severe performance degradation when a domain gap exists, such as the simulator and real robot. In this paper, we propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot. It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture. Specifically, we first propose an MDO-K to align kinematics, which exploits temporal continuity to transfer motion directions with smaller gap rather than position values, relieving the adaptation burden. Moreover, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Multimodal Machine Learning Applications
