SeeSay: An Assistive Device for the Visually Impaired Using Retrieval   Augmented Generation

Melody Yu

arXiv:2410.03771·cs.HC·October 8, 2024

SeeSay: An Assistive Device for the Visually Impaired Using Retrieval Augmented Generation

Melody Yu

PDF

Open Access

TL;DR

SeeSay is an assistive device for the visually impaired that uses large language models and retrieval-augmented generation to recognize surroundings and provide audio guidance, enhancing independence and navigation.

Contribution

This paper introduces SeeSay, a novel system combining LLMs and RAG for environmental recognition and audio feedback for visually impaired users.

Findings

01

Effective recognition of surroundings in diverse settings

02

Successful audio responses to user queries

03

Enhanced environmental perception and navigation

Abstract

In this paper, we present SeeSay, an assistive device designed for individuals with visual impairments. This system leverages large language models (LLMs) for speech recognition and visual querying. It effectively identifies, records, and responds to the user's environment by providing audio guidance using retrieval-augmented generation (RAG). Our experiments demonstrate the system's capability to recognize its surroundings and respond to queries with audio feedback in diverse settings. We hope that the SeeSay system will facilitate users' comprehension and recollection of their surroundings, thereby enhancing their environmental perception, improving navigational capabilities, and boosting overall independence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTactile and Sensory Interactions · Digital Accessibility for Disabilities · Smart Parking Systems Research