PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models
Jiangong Chen, Mingyu Zhu, Bin Li

TL;DR
PRISM-XR is a privacy-aware framework for multi-user XR collaboration that filters sensitive data on edge devices and ensures efficient, accurate, and privacy-preserving content sharing using multimodal large language models.
Contribution
It introduces a novel edge preprocessing and synchronization mechanism for privacy-preserving, multimodal LLM-based XR collaboration, addressing privacy and efficiency challenges.
Findings
Achieves nearly 90% accuracy in fulfilling user requests.
Maintains registration time under 0.27 seconds.
Filters highly sensitive objects in over 90% of scenarios.
Abstract
Multimodal Large Language Models (MLLMs) enhance collaboration in Extended Reality (XR) environments by enabling flexible object and animation creation through the combination of natural language and visual inputs. However, visual data captured by XR headsets includes real-world backgrounds that may contain irrelevant or sensitive user information, such as credit cards left on the table or facial identities of other users. Uploading those frames to cloud-based MLLMs poses serious privacy risks, particularly when such data is processed without explicit user consent. Additionally, existing colocation and synchronization mechanisms in commercial XR APIs rely on time-consuming, privacy-invasive environment scanning and struggle to adapt to the highly dynamic nature of MLLM-integrated XR environments. In this paper, we propose PRISM-XR, a novel framework that facilitates multi-user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
