AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results

Dawar Khan; Xinyu Liu; Omar Mena; Donggang Jia; Alexandre Kouyoumdjian; Ivan Viola

arXiv:2502.15761·cs.DC·May 14, 2026

AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results

Dawar Khan, Xinyu Liu, Omar Mena, Donggang Jia, Alexandre Kouyoumdjian, Ivan Viola

PDF

1 Repo

TL;DR

AIvaluateXR introduces a comprehensive framework for benchmarking large language models on XR devices, evaluating performance, efficiency, and accuracy to guide optimal deployment strategies.

Contribution

The paper presents a novel evaluation framework and a unified method for assessing LLMs on XR hardware, including benchmarking results across multiple devices and models.

Findings

01

Performance varies significantly across device-model pairs.

02

The framework identifies optimal trade-offs between quality and speed.

03

On-device LLMs show competitive efficiency compared to cloud-based setups.

Abstract

The deployment of large language models (LLMs) on extended reality (XR) devices has great potential to advance the field of human-AI interaction. In the case of direct, on-device model inference, selecting the appropriate model and device for specific tasks remains challenging. In this paper, we present AIvaluateXR, a comprehensive evaluation framework for benchmarking LLMs running on XR devices. To demonstrate the framework, we deploy 17 selected LLMs across four XR platforms: Magic Leap 2, Meta Quest 3, Vivo X100s Pro, and Apple Vision Pro, and conduct an extensive evaluation. Our experimental setup measures four key metrics: performance consistency, processing speed, memory usage, and battery consumption. For each of the 68 model-device pairs, we assess performance under varying string lengths, batch sizes, and thread counts, analyzing the trade-offs for real-time XR applications. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.