Identifying Hearing Difficulty Moments in Conversational Audio

Jack Collins; Adrian Buzea; Chris Collier; Alejandro Ballesta Rosen; Julian Maclaren; Richard F. Lyon; Simon Carlile

arXiv:2507.23590·cs.SD·August 1, 2025

Identifying Hearing Difficulty Moments in Conversational Audio

Jack Collins, Adrian Buzea, Chris Collier, Alejandro Ballesta Rosen, Julian Maclaren, Richard F. Lyon, Simon Carlile

PDF

Open Access

TL;DR

This paper presents machine learning methods to detect moments of hearing difficulty in conversations, highlighting the effectiveness of multimodal audio language models over traditional ASR-based approaches.

Contribution

It introduces and compares novel machine learning solutions, emphasizing the superior performance of multimodal audio language models for real-time hearing difficulty detection.

Findings

01

Audio language models outperform ASR heuristics

02

Multimodal models excel in detecting hearing difficulty moments

03

Proposed methods improve real-time hearing assistance accuracy

Abstract

Individuals regularly experience Hearing Difficulty Moments in everyday conversation. Identifying these moments of hearing difficulty has particular significance in the field of hearing assistive technology where timely interventions are key for realtime hearing assistance. In this paper, we propose and compare machine learning solutions for continuously detecting utterances that identify these specific moments in conversational audio. We show that audio language models, through their multimodal reasoning capabilities, excel at this task, significantly outperforming a simple ASR hotword heuristic and a more conventional fine-tuning approach with Wav2Vec, an audio-only input architecture that is state-of-the-art for automatic speech recognition (ASR).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation