Toward Safe, Trustworthy and Realistic Augmented Reality User Experience
Yanming Xiu

TL;DR
This paper presents systems to detect malicious or unsafe AR content using vision-language models, aiming to enhance safety and trustworthiness in augmented reality experiences.
Contribution
The paper introduces ViDDAR and VIM-Sense systems for detecting AR content attacks and proposes future research directions for perceptual quality assessment and multimodal attack detection.
Findings
ViDDAR and VIM-Sense effectively detect AR content attacks.
Framework supports scalable, human-aligned AR safety measures.
Identifies key future research areas in AR content safety.
Abstract
As augmented reality (AR) becomes increasingly integrated into everyday life, ensuring the safety and trustworthiness of its virtual content is critical. Our research addresses the risks of task-detrimental AR content, particularly that which obstructs critical information or subtly manipulates user perception. We developed two systems, ViDDAR and VIM-Sense, to detect such attacks using vision-language models (VLMs) and multimodal reasoning modules. Building on this foundation, we propose three future directions: automated, perceptually aligned quality assessment of virtual content; detection of multimodal attacks; and adaptation of VLMs for efficient and user-centered deployment on AR devices. Overall, our work aims to establish a scalable, human-aligned framework for safeguarding AR experiences and seeks feedback on perceptual modeling, multimodal AR content implementation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVirtual Reality Applications and Impacts · Augmented Reality Applications
