Wideband Audio Waveform Evaluation Networks: Efficient, Accurate   Estimation of Speech Qualities

Andrew Catellier; Stephen Voran

arXiv:2206.13272·eess.AS·November 21, 2023

Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities

Andrew Catellier, Stephen Voran

PDF

Open Access 1 Repo

TL;DR

WAWEnets are efficient convolutional neural networks that evaluate speech quality directly from wideband audio waveforms, accurately predicting multiple objective and subjective speech quality metrics without needing reference signals.

Contribution

This work introduces a unified, efficient WAWEnet architecture capable of estimating multiple speech quality and intelligibility metrics simultaneously from raw audio.

Findings

01

A single WAWEnet tracks seven quality and intelligibility metrics.

02

A second network estimates four subjective speech quality dimensions.

03

A third network achieves high agreement on subjective quality scores.

Abstract

Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give qualities of telecommunications speech (e.g., noisiness, intelligibility, overall speech quality). WAWEnets are no-reference networks because they do not require ``reference'' (original or undistorted) versions of the waveforms they evaluate. Our initial WAWEnet publication introduced four WAWEnets and each emulated the output of an established full-reference speech quality or intelligibility estimation algorithm. We have updated the WAWEnet architecture to be more efficient and effective. Here we present a single WAWEnet that closely tracks seven different quality and intelligibility values. We create a second network that additionally tracks four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ntia/wenets
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing