Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models

Xijie Zhang; Fengliang He; Hong-Ning Dai

arXiv:2511.07085·cs.HC·November 11, 2025

Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models

Xijie Zhang, Fengliang He, Hong-Ning Dai

PDF

Open Access

TL;DR

This paper introduces a novel acoustic gesture recognition framework for VR/AR that leverages large language models, enabling effective few-shot and zero-shot learning without extensive retraining, thus improving interaction efficiency.

Contribution

It is the first to utilize large language models for CIR-based gesture recognition in VR/AR, addressing few-shot learning challenges with a new differential CIR dataset.

Findings

01

Achieves accuracy comparable to classical methods without retraining

02

Uses differential CIR data to improve gesture recognition

03

Demonstrates effectiveness across diverse gesture categories

Abstract

Natural and efficient interaction remains a critical challenge for virtual reality and augmented reality (VR/AR) systems. Vision-based gesture recognition suffers from high computational cost, sensitivity to lighting conditions, and privacy leakage concerns. Acoustic sensing provides an attractive alternative: by emitting inaudible high-frequency signals and capturing their reflections, channel impulse response (CIR) encodes how gestures perturb the acoustic field in a low-cost and user-transparent manner. However, existing CIR-based gesture recognition methods often rely on extensive training of models on large labeled datasets, making them unsuitable for few-shot VR scenarios. In this work, we propose the first framework that leverages large language models (LLMs) for CIR-based gesture recognition in VR/AR systems. Despite LLMs' strengths, it is non-trivial to achieve few-shot and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Interactive and Immersive Displays · Face recognition and analysis