SIAgent: Spatial Interaction Agent via LLM-powered Eye-Hand Motion Intent Understanding in VR
Zhimin Wang, Chenyu Gu, and Feng Lu

TL;DR
SIAgent introduces an 'Intent-to-Operation' framework in VR that uses natural eye-hand motions for interaction, improving accuracy, reducing fatigue, and eliminating gesture memorization.
Contribution
The paper presents a novel intent recognition system that translates spatial data into natural language, enabling intuitive VR interactions without predefined gestures.
Findings
Achieved 97.2% intent recognition accuracy
Reduced arm fatigue compared to traditional methods
Enhanced usability and user preference
Abstract
Eye-hand coordinated interaction is becoming a mainstream interaction modality in Virtual Reality (VR) user interfaces.Current paradigms for this multimodal interaction require users to learn predefined gestures and memorize multiple gesture-task associations, which can be summarized as an ``Operation-to-Intent" paradigm. This paradigm increases users' learning costs and has low interaction error tolerance. In this paper, we propose SIAgent, a novel "Intent-to-Operation" framework allowing users to express interaction intents through natural eye-hand motions based on common sense and habits. Our system features two main components: (1) intent recognition that translates spatial interaction data into natural language and infers user intent, and (2) agent-based execution that generates an agent to execute corresponding tasks. This eliminates the need for gesture memorization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Hand Gesture Recognition Systems · Virtual Reality Applications and Impacts
