Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech   Intelligibility

Tianqu Kang; Anh-Dung Dinh; Binghong Wang; Tianyuan Du; Yijia Chen,; and Kevin Chau (Hong Kong University of Science; Technology)

arXiv:2202.02545·cs.SD·July 25, 2022·1 cites

Optimization of a Real-Time Wavelet-Based Algorithm for Improving Speech Intelligibility

Tianqu Kang, Anh-Dung Dinh, Binghong Wang, Tianyuan Du, Yijia Chen,, and Kevin Chau (Hong Kong University of Science, Technology)

PDF

Open Access

TL;DR

This paper presents a real-time wavelet-based algorithm that enhances speech intelligibility by adjusting sub-band gains, improving transcription accuracy under various noise and hearing loss conditions, with applications in hearing aids and speech processing.

Contribution

It introduces a simplified, real-time wavelet-based method for speech enhancement that effectively improves intelligibility across different noise levels and hearing impairments.

Findings

01

16.9% increase in transcription accuracy for clean speech

02

9.5% increase in transcription accuracy for noisy speech

03

Universal sub-band gains effective up to 4.8 dB noise-to-signal ratio

Abstract

The optimization of a wavelet-based algorithm to improve speech intelligibility along with the full data set and results are reported. The discrete-time speech signal is split into frequency sub-bands via a multi-level discrete wavelet transform. Various gains are applied to the sub-band signals before they are recombined to form a modified version of the speech. The sub-band gains are adjusted while keeping the overall signal energy unchanged, and the speech intelligibility under various background interference and simulated hearing loss conditions is enhanced and evaluated objectively and quantitatively using Google Speech-to-Text transcription. A universal set of sub-band gains can work over a range of noise-to-signal ratios up to 4.8 dB. For noise-free speech, overall intelligibility is improved, and the Google transcription accuracy is increased by 16.9 percentage points on average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Structural Health Monitoring Techniques