Domain Adaptive Robotic Gesture Recognition with Unsupervised   Kinematic-Visual Data Alignment

Xueying Shi; Yueming Jin; Qi Dou; Jing Qin; and Pheng-Ann Heng

arXiv:2103.04075·cs.CV·July 20, 2021·1 cites

Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment

Xueying Shi, Yueming Jin, Qi Dou, Jing Qin, and Pheng-Ann Heng

PDF

Open Access

TL;DR

This paper introduces an unsupervised domain adaptation framework for robotic gesture recognition that aligns multi-modal data from simulators to real robots, significantly improving accuracy without requiring real robot annotations.

Contribution

It presents a novel multi-modal domain adaptation method that aligns kinematic and visual data using temporal cues and correlation-based features, enhancing transferability in surgical gesture recognition.

Findings

01

Achieves up to 12.91% accuracy improvement

02

Improves F1 score by 20.16%

03

Effectively transfers knowledge from simulator to real robot

Abstract

Automated surgical gesture recognition is of great importance in robot-assisted minimally invasive surgery. However, existing methods assume that training and testing data are from the same domain, which suffers from severe performance degradation when a domain gap exists, such as the simulator and real robot. In this paper, we propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot. It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture. Specifically, we first propose an MDO-K to align kinematics, which exploits temporal continuity to transfer motion directions with smaller gap rather than position values, relieving the adaptation burden. Moreover, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Multimodal Machine Learning Applications