SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

Zhuohang Jiang; Xu Yuan; Haohao Qu; Shanru Lin; Kanglong Liu; Wenqi Fan; Qing Li

arXiv:2602.22683·cs.CV·April 10, 2026

SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

Zhuohang Jiang, Xu Yuan, Haohao Qu, Shanru Lin, Kanglong Liu, Wenqi Fan, Qing Li

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces SUPERGLASSES, a real-world VQA benchmark for smart glasses, and proposes SUPERLENS, a retrieval-augmented multimodal agent that significantly improves performance.

Contribution

It provides the first comprehensive, real-world VQA dataset for smart glasses and develops SUPERLENS, a novel multimodal agent that outperforms existing models.

Findings

01

26 VLMs evaluated reveal significant performance gaps.

02

SUPERLENS achieves state-of-the-art results, surpassing GPT-4o by 2.19%.

03

The dataset and benchmark are publicly available for further research.

Abstract

The rapid advancement of AI-powered smart glasses-one of the hottest wearable devices-has unlocked new frontiers for multimodal interaction, with Visual Question Answering (VQA) over external knowledge sources emerging as a core application. Existing Vision Language Models (VLMs) adapted to smart glasses are typically trained and evaluated on traditional multimodal datasets; however, these datasets lack the variety and realism needed to reflect smart glasses usage scenarios and diverge from their specific challenges, where accurately identifying the object of interest must precede any external knowledge retrieval. To bridge this gap, we introduce SUPER- GLASSES, the first comprehensive VQA benchmark built on real-world data entirely collected by smart glasses devices. SUPERGLASSES comprises 2,422 egocentric image-question pairs spanning 14 image domains and 8 query categories, enriched…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/xandery/SuperGlasses
github

Datasets

xandery/SuperGlasses
dataset· 196 dl
196 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.