Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models
Xijie Zhang, Fengliang He, Hong-Ning Dai

TL;DR
This paper introduces a novel acoustic gesture recognition framework for VR/AR that leverages large language models, enabling effective few-shot and zero-shot learning without extensive retraining, thus improving interaction efficiency.
Contribution
It is the first to utilize large language models for CIR-based gesture recognition in VR/AR, addressing few-shot learning challenges with a new differential CIR dataset.
Findings
Achieves accuracy comparable to classical methods without retraining
Uses differential CIR data to improve gesture recognition
Demonstrates effectiveness across diverse gesture categories
Abstract
Natural and efficient interaction remains a critical challenge for virtual reality and augmented reality (VR/AR) systems. Vision-based gesture recognition suffers from high computational cost, sensitivity to lighting conditions, and privacy leakage concerns. Acoustic sensing provides an attractive alternative: by emitting inaudible high-frequency signals and capturing their reflections, channel impulse response (CIR) encodes how gestures perturb the acoustic field in a low-cost and user-transparent manner. However, existing CIR-based gesture recognition methods often rely on extensive training of models on large labeled datasets, making them unsuitable for few-shot VR scenarios. In this work, we propose the first framework that leverages large language models (LLMs) for CIR-based gesture recognition in VR/AR systems. Despite LLMs' strengths, it is non-trivial to achieve few-shot and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Interactive and Immersive Displays · Face recognition and analysis
