SeeSay: An Assistive Device for the Visually Impaired Using Retrieval Augmented Generation
Melody Yu

TL;DR
SeeSay is an assistive device for the visually impaired that uses large language models and retrieval-augmented generation to recognize surroundings and provide audio guidance, enhancing independence and navigation.
Contribution
This paper introduces SeeSay, a novel system combining LLMs and RAG for environmental recognition and audio feedback for visually impaired users.
Findings
Effective recognition of surroundings in diverse settings
Successful audio responses to user queries
Enhanced environmental perception and navigation
Abstract
In this paper, we present SeeSay, an assistive device designed for individuals with visual impairments. This system leverages large language models (LLMs) for speech recognition and visual querying. It effectively identifies, records, and responds to the user's environment by providing audio guidance using retrieval-augmented generation (RAG). Our experiments demonstrate the system's capability to recognize its surroundings and respond to queries with audio feedback in diverse settings. We hope that the SeeSay system will facilitate users' comprehension and recollection of their surroundings, thereby enhancing their environmental perception, improving navigational capabilities, and boosting overall independence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Digital Accessibility for Disabilities · Smart Parking Systems Research
