Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
Yanming Xiu, Maria Gorlatova

TL;DR
This paper presents a new multimodal semantic reasoning framework, VIM-Sense, for detecting visual information manipulation attacks in augmented reality, achieving high accuracy and real-time performance.
Contribution
It introduces a comprehensive taxonomy, a new dataset AR-VIM, and a multimodal detection method combining vision-language models and OCR for AR security.
Findings
VIM-Sense achieves 88.94% detection accuracy on AR-VIM.
The system operates with an average latency of around 7 seconds.
The approach outperforms vision-only and text-only baselines.
Abstract
The virtual content in augmented reality (AR) can introduce misleading or harmful information, leading to semantic misunderstandings or user errors. In this work, we focus on visual information manipulation (VIM) attacks in AR, where virtual content changes the meaning of real-world scenes in subtle but impactful ways. We introduce a taxonomy that categorizes these attacks into three formats: character, phrase, and pattern manipulation, and three purposes: information replacement, information obfuscation, and extra wrong information. Based on the taxonomy, we construct a dataset, AR-VIM, which consists of 452 raw-AR video pairs spanning 202 different scenes, each simulating a real-world AR scenario. To detect the attacks in the dataset, we propose a multimodal semantic reasoning framework, VIM-Sense. It combines the language and visual understanding capabilities of vision-language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · User Authentication and Security Systems
