Guided Reality: Generating Visually-Enriched AR Task Guidance with LLMs and Vision Models
Ada Yi Zhao, Aditya Gunturu, Ellen Yi-Luen Do, Ryo Suzuki

TL;DR
Guided Reality is an automated AR system that combines LLMs and vision models to generate dynamic, visually-enriched guidance embedded in physical space, enhancing task understanding and execution.
Contribution
The paper introduces a novel system that integrates LLMs and vision models to generate spatially embedded visual guidance for AR tasks, addressing limitations of prior text-only instructions.
Findings
System effectively generates multi-step instructions with visual guidance.
User study shows improved task performance with Guided Reality.
Instructors see potential for training workflow integration.
Abstract
Large language models (LLMs) have enabled the automatic generation of step-by-step augmented reality (AR) instructions for a wide range of physical tasks. However, existing LLM-based AR guidance often lacks rich visual augmentations to effectively embed instructions into spatial context for a better user understanding. We present Guided Reality, a fully automated AR system that generates embedded and dynamic visual guidance based on step-by-step instructions. Our system integrates LLMs and vision models to: 1) generate multi-step instructions from user queries, 2) identify appropriate types of visual guidance, 3) extract spatial information about key interaction points in the real world, and 4) embed visual guidance in physical space to support task execution. Drawing from a corpus of user manuals, we define five categories of visual guidance and propose an identification strategy based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
