Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality
Mahya Qorbani, Kamran Paynabar, Mohsen Moghaddam

TL;DR
This paper develops a context-aware LLM assistant for AR/VR environments that integrates multimodal data to provide real-time, relevant support during complex industrial tasks, enhancing user assistance effectiveness.
Contribution
It introduces a novel multimodal, context-aware LLM framework with incremental prompting for real-time AR/VR assistance, evaluated on the new HoloAssist dataset.
Findings
Multimodal context improves response accuracy.
Incremental prompting enhances performance with richer context.
The system effectively integrates gaze, actions, and dialogue for support.
Abstract
The growing adoption of augmented and virtual reality (AR and VR) technologies in industrial training and on-the-job assistance has created new opportunities for intelligent, context-aware support systems. As workers perform complex tasks guided by AR and VR, these devices capture rich streams of multimodal data, including gaze, hand actions, and task progression, that can reveal user intent and task state in real time. Leveraging this information effectively remains a major challenge. In this work, we present a context-aware large language model (LLM) assistant that integrates diverse data modalities, such as hand actions, task steps, and dialogue history, into a unified framework for real-time question answering. To systematically study how context influences performance, we introduce an incremental prompting framework, where each model version receives progressively richer contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Topic Modeling
