Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users
Jacob Bradshaw, Mohsen Riahi Alam, Bhanuja Ainary, Minseo Kim, Mohsen Amini Salehi

TL;DR
Audo-Sight is an AI-driven assistive system for blind and low-vision users that combines edge and cloud processing to deliver faster, more accurate scene descriptions through voice interaction, enhancing accessibility.
Contribution
The paper introduces a novel edge-cloud architecture with a Response Fusion Engine that improves response speed and accuracy for assistive scene perception.
Findings
80% faster speech output for urgent tasks
50% faster complete responses overall
Preferred over GPT-5 by 62% of users
Abstract
Despite advances in assistive technologies, Blind and Low-Vision (BLV) individuals continue to face challenges in understanding their surroundings. Delivering concise, useful, and timely scene descriptions for ambient perception remains a long-standing accessibility problem. To address this, we introduce Audo-Sight, an AI-driven assistive system across Edge-Cloud that enables BLV individuals to perceive their surroundings through voice-based conversational interaction. Audo-Sight employs a set of expert and generic AI agents, each supported by dedicated processing pipelines distributed across edge and cloud. It analyzes user queries by considering urgency and contextual information to infer the user intent and dynamically route each query, along with a scene frame, to the most suitable pipeline. In cases where users require fast responses, the system simultaneously leverages edge and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Multimodal Machine Learning Applications · Social Robot Interaction and HRI
