Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality
Jiahuan Pei, Irene Viola, Haochen Huang, Junxiao Wang, Moonisa Ahsan,, Fanghua Ye, Jiang Yiming, Yao Sai, Di Wang, Zhumin Chen, Pengjie Ren, Pablo, Cesar

TL;DR
This paper introduces an autonomous workflow integrating multimodal AI agents into XR environments for fine-grained training, demonstrated through a LEGO assembly assistant, and provides a new dataset and benchmarks for future research.
Contribution
It presents a novel autonomous workflow for multimodal AI in XR, including a new dataset and benchmarking of LLMs for fine-grained training assistants.
Findings
Demonstrated a multimodal LEGO assembly training assistant in XR.
Created LEGO-MRTA, a comprehensive multimodal dataset for assembly dialogue.
Benchmarked several LLMs, showing their capabilities and limitations in XR training tasks.
Abstract
Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment. Specifically, we design a cerebral language agent that integrates LLM with memory, planning, and interaction with XR tools and a vision-language agent, enabling agents to decide their actions based on past experiences. Furthermore, we introduce LEGO-MRTA, a multimodal fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAugmented Reality Applications
