Developing an AI-Guided Assistant Device for the Deaf and Hearing Impaired
Jiayu (Jerry) Liu

TL;DR
This paper presents a deep learning-based system for aiding deaf and hearing-impaired individuals by localizing and identifying sounds in real time using multimodal data and custom hardware.
Contribution
It introduces a novel multimodal AI system combining sound localization, classification, and visual data integration for accessibility devices.
Findings
JerryNet achieved 91.1% precision in sound direction detection.
CLAP model reached 98.5% accuracy on custom dataset.
Localization model had a cIoU of 0.892, outperforming similar models.
Abstract
This study aims to develop a deep learning system for an accessibility device for the deaf or hearing impaired. The device will accurately localize and identify sound sources in real time. This study will fill an important gap in current research by leveraging machine learning techniques to target the underprivileged community. The system includes three main components. 1. JerryNet: A custom designed CNN architecture that determines the direction of arrival (DoA) for nine possible directions. 2. Audio Classification: This model is based on fine-tuning the Contrastive Language-Audio Pretraining (CLAP) model to identify the exact sound classes only based on audio. 3. Multimodal integration model: This is an accurate sound localization model that combines audio, visual, and text data to locate the exact sound sources in the images. The part consists of two modules, one object detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
