SIAgent: Spatial Interaction Agent via LLM-powered Eye-Hand Motion Intent Understanding in VR

Zhimin Wang; Chenyu Gu; and Feng Lu

arXiv:2603.00522·cs.HC·March 3, 2026

SIAgent: Spatial Interaction Agent via LLM-powered Eye-Hand Motion Intent Understanding in VR

Zhimin Wang, Chenyu Gu, and Feng Lu

PDF

Open Access

TL;DR

SIAgent introduces an 'Intent-to-Operation' framework in VR that uses natural eye-hand motions for interaction, improving accuracy, reducing fatigue, and eliminating gesture memorization.

Contribution

The paper presents a novel intent recognition system that translates spatial data into natural language, enabling intuitive VR interactions without predefined gestures.

Findings

01

Achieved 97.2% intent recognition accuracy

02

Reduced arm fatigue compared to traditional methods

03

Enhanced usability and user preference

Abstract

Eye-hand coordinated interaction is becoming a mainstream interaction modality in Virtual Reality (VR) user interfaces.Current paradigms for this multimodal interaction require users to learn predefined gestures and memorize multiple gesture-task associations, which can be summarized as an ``Operation-to-Intent" paradigm. This paradigm increases users' learning costs and has low interaction error tolerance. In this paper, we propose SIAgent, a novel "Intent-to-Operation" framework allowing users to express interaction intents through natural eye-hand motions based on common sense and habits. Our system features two main components: (1) intent recognition that translates spatial interaction data into natural language and infers user intent, and (2) agent-based execution that generates an agent to execute corresponding tasks. This eliminates the need for gesture memorization and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Hand Gesture Recognition Systems · Virtual Reality Applications and Impacts