Gesture First, LLM-Assisted Voice Complement: Exploring Multimodal Robot 'Puppeteer' Teleoperation Via Virtual Counterpart in Augmented Reality
Yuchong Zhang, Bastian Orthmann, Shichen Ji, Michael Welle, Jonne Van Haastregt, Danica Kragic

TL;DR
This study compares gesture-only and voice+gesture interaction modes in AR-based robot teleoperation, revealing trade-offs between efficiency, flexibility, and user workload, and offers design guidelines for multimodal control.
Contribution
It introduces a multimodal AR 'puppeteer' system for robot control and empirically evaluates the impact of gesture and voice modalities on performance and user experience.
Findings
Gesture-only control is more reliable and efficient for time-critical tasks.
Voice+gesture control offers greater flexibility but can increase workload due to latency.
Prior robotics expertise influences user performance and experience.
Abstract
Robot teleoperation via augmented reality (AR) offers a promising path toward more intuitive human-robot interaction (HRI). We present a head-mounted AR 'puppeteer' system in which users control a physical robot by interacting with its virtual counterpart robot using large language model (LLM)-assisted voice commands and hand-gesture interaction on the Meta Quest 3. In a within-subject user study with 42 participants performing an AR-based robotic pick-and-place pattern-matching task, we empirically compare two interaction conditions: gesture-only (GO) and combined voice+gesture (VG) on performance and user experience (UX). In VG, voice and gesture operate in a sequential role-allocated manner, with voice handling high-level navigation and gesture handling fine manipulation. Our results show that GO currently provides more reliable and efficient control for this time-critical task,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · AI in Service Interactions · Speech and dialogue systems
