Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users

Jacob Bradshaw; Mohsen Riahi Alam; Bhanuja Ainary; Minseo Kim; Mohsen Amini Salehi

arXiv:2603.13668·cs.DC·March 17, 2026

Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users

Jacob Bradshaw, Mohsen Riahi Alam, Bhanuja Ainary, Minseo Kim, Mohsen Amini Salehi

PDF

Open Access

TL;DR

Audo-Sight is an AI-driven assistive system for blind and low-vision users that combines edge and cloud processing to deliver faster, more accurate scene descriptions through voice interaction, enhancing accessibility.

Contribution

The paper introduces a novel edge-cloud architecture with a Response Fusion Engine that improves response speed and accuracy for assistive scene perception.

Findings

01

80% faster speech output for urgent tasks

02

50% faster complete responses overall

03

Preferred over GPT-5 by 62% of users

Abstract

Despite advances in assistive technologies, Blind and Low-Vision (BLV) individuals continue to face challenges in understanding their surroundings. Delivering concise, useful, and timely scene descriptions for ambient perception remains a long-standing accessibility problem. To address this, we introduce Audo-Sight, an AI-driven assistive system across Edge-Cloud that enables BLV individuals to perceive their surroundings through voice-based conversational interaction. Audo-Sight employs a set of expert and generic AI agents, each supported by dedicated processing pipelines distributed across edge and cloud. It analyzes user queries by considering urgency and contextual information to infer the user intent and dynamically route each query, along with a scene frame, to the most suitable pipeline. In cases where users require fast responses, the system simultaneously leverages edge and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTactile and Sensory Interactions · Multimodal Machine Learning Applications · Social Robot Interaction and HRI