"Less is More": Reducing Cognitive Load and Task Drift in Real-Time Multimodal Assistive Agents for the Visually Impaired
Yi Zhao, Siqi Wang, Qiqun Geng, Erxin Yu, Jing Li

TL;DR
This paper introduces VIA-Agent, a multimodal assistive system for the visually impaired that reduces cognitive load and task drift, improving usability and efficiency in real-world navigation and object retrieval tasks.
Contribution
The paper presents VIA-Agent, a novel co-optimized system that enhances real-time visual assistance for the visually impaired by minimizing cognitive load and task drift.
Findings
VIA-Agent outperformed BeMyAI in task success and user satisfaction.
Reduced mean task time by approximately 40% compared to baseline systems.
Lowered perceived cognitive load and task drift, increasing system usability.
Abstract
Vision-Language Models (VLMs) enable on-demand visual assistance, yet current applications for people with visual impairments (PVI) impose high cognitive load and exhibit task drift, limiting real-world utility. We first conducted a formative study with 15 PVI and identified three requirements for visually impaired assistance (VIA): low latency for real-time use, minimal cognitive load, and hallucination-resistant responses to sustain trust. Informed by the formative study, we present VIA-Agent, a prototype that co-optimizes its cognitive 'brain' and interactive 'body'. The brain implements a goal-persistent design with calibrated conciseness to produce brief, actionable guidance; the body adopts a real-time communication (RTC) embodiment-evolving from a request-response model Context Protocol (MCP) pipeline-to-support fluid interaction. We evaluated VIA-Agent with 9 PVI across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Gaze Tracking and Assistive Technology · Human-Automation Interaction and Safety
