ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality

Yanming Xiu; Tim Scargill; Maria Gorlatova

arXiv:2501.12553·cs.CV·September 4, 2025

ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality

Yanming Xiu, Tim Scargill, Maria Gorlatova

PDF

Open Access 1 Repo

TL;DR

This paper introduces ViDDAR, a novel vision-language model-based system designed to detect task-detrimental virtual content in augmented reality, addressing obstruction and information manipulation attacks to improve user task performance.

Contribution

ViDDAR is the first system to utilize vision-language models for detecting harmful virtual content in AR, combining deep learning with a user-edge-cloud architecture for effective real-time monitoring.

Findings

01

Achieves 92.15% accuracy in obstruction detection

02

Detects information manipulation with 82.46% accuracy

03

Operates with low latency for obstruction detection (533 ms)

Abstract

In Augmented Reality (AR), virtual content enhances user experience by providing additional information. However, improperly positioned or designed virtual content can be detrimental to task performance, as it can impair users' ability to accurately interpret real-world information. In this paper we examine two types of task-detrimental virtual content: obstruction attacks, in which virtual content prevents users from seeing real-world objects, and information manipulation attacks, in which virtual content interferes with users' ability to accurately interpret real-world information. We provide a mathematical framework to characterize these attacks and create a custom open-source dataset for attack evaluation. To address these attacks, we introduce ViDDAR (Vision language model-based Task-Detrimental content Detector for Augmented Reality), a comprehensive full-reference system that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ym-xiu/viddar-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Automated Systems · Augmented Reality Applications