Developing an AI-Guided Assistant Device for the Deaf and Hearing Impaired

Jiayu (Jerry) Liu

arXiv:2507.14215·cs.LG·July 22, 2025

Developing an AI-Guided Assistant Device for the Deaf and Hearing Impaired

Jiayu (Jerry) Liu

PDF

TL;DR

This paper presents a deep learning-based system for aiding deaf and hearing-impaired individuals by localizing and identifying sounds in real time using multimodal data and custom hardware.

Contribution

It introduces a novel multimodal AI system combining sound localization, classification, and visual data integration for accessibility devices.

Findings

01

JerryNet achieved 91.1% precision in sound direction detection.

02

CLAP model reached 98.5% accuracy on custom dataset.

03

Localization model had a cIoU of 0.892, outperforming similar models.

Abstract

This study aims to develop a deep learning system for an accessibility device for the deaf or hearing impaired. The device will accurately localize and identify sound sources in real time. This study will fill an important gap in current research by leveraging machine learning techniques to target the underprivileged community. The system includes three main components. 1. JerryNet: A custom designed CNN architecture that determines the direction of arrival (DoA) for nine possible directions. 2. Audio Classification: This model is based on fine-tuning the Contrastive Language-Audio Pretraining (CLAP) model to identify the exact sound classes only based on audio. 3. Multimodal integration model: This is an accurate sound localization model that combines audio, visual, and text data to locate the exact sound sources in the images. The part consists of two modules, one object detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.