MRPoS: Mixed Reality-Based Robot Navigation Interface Using Spatial Pointing and Speech with Large Language Model
Eduardo Iglesius, Masato Kobayashi, Yuki Uranishi

TL;DR
MRPoS introduces a multimodal MR interface combining spatial pointing and LLM-based speech to improve robot navigation, reducing task time and workload compared to gesture-based systems.
Contribution
This paper presents a novel MR navigation interface that replaces manual gestures with natural speech and spatial pointing, enhancing accessibility and efficiency.
Findings
Significantly reduced task completion time.
Lowered workload for users.
Improved accessibility for beginners.
Abstract
Recent advancements have made robot navigation more intuitive by transitioning from traditional 2D displays to spatially aware Mixed Reality (MR) systems. However, current MR interfaces often rely on manual "air tap" gestures for goal placement, which can be repetitive and physically demanding, especially for beginners. This paper proposes the Mixed Reality-Based Robot Navigation Interface using Spatial Pointing and Speech (MRPoS). This novel framework replaces complex hand gestures with a natural, multimodal interface combining spatial pointing with Large Language Model (LLM)-based speech interaction. By leveraging both information, the system translates verbal intent into navigation goals visualized by MR technology. Comprehensive experiments comparing MRPoS against conventional gesture-based systems demonstrate that our approach significantly reduces task completion time and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Augmented Reality Applications · Social Robot Interaction and HRI
