Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities
Andrew Catellier, Stephen Voran

TL;DR
WAWEnets are efficient convolutional neural networks that evaluate speech quality directly from wideband audio waveforms, accurately predicting multiple objective and subjective speech quality metrics without needing reference signals.
Contribution
This work introduces a unified, efficient WAWEnet architecture capable of estimating multiple speech quality and intelligibility metrics simultaneously from raw audio.
Findings
A single WAWEnet tracks seven quality and intelligibility metrics.
A second network estimates four subjective speech quality dimensions.
A third network achieves high agreement on subjective quality scores.
Abstract
Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give qualities of telecommunications speech (e.g., noisiness, intelligibility, overall speech quality). WAWEnets are no-reference networks because they do not require ``reference'' (original or undistorted) versions of the waveforms they evaluate. Our initial WAWEnet publication introduced four WAWEnets and each emulated the output of an established full-reference speech quality or intelligibility estimation algorithm. We have updated the WAWEnet architecture to be more efficient and effective. Here we present a single WAWEnet that closely tracks seven different quality and intelligibility values. We create a second network that additionally tracks four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
