CCATMos: Convolutional Context-aware Transformer Network for   Non-intrusive Speech Quality Assessment

Yuchen Liu; Li-Chia Yang; Alex Pawlicki; Marko Stamenovic

arXiv:2211.02577·eess.AS·November 7, 2022

CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment

Yuchen Liu, Li-Chia Yang, Alex Pawlicki, Marko Stamenovic

PDF

TL;DR

This paper introduces CCAT, a novel non-intrusive speech quality assessment model that combines convolutional and transformer architectures, achieving higher correlation with human ratings across multiple datasets.

Contribution

The paper presents a new end-to-end convolutional transformer model for non-intrusive speech quality prediction, outperforming existing models in correlation and error metrics.

Findings

01

CCAT achieves higher Pearson correlation (0.697) than baseline (0.530).

02

CCAT reduces RMSE from 0.768 to 0.570.

03

Model performs well across multiple languages and distortions.

Abstract

Speech quality assessment has been a critical component in many voice communication related applications such as telephony and online conferencing. Traditional intrusive speech quality assessment requires the clean reference of the degraded utterance to provide an accurate quality measurement. This requirement limits the usability of these methods in real-world scenarios. On the other hand, non-intrusive subjective measurement is the ``golden standard" in evaluating speech quality as human listeners can intrinsically evaluate the quality of any degraded speech with ease. In this paper, we propose a novel end-to-end model structure called Convolutional Context-Aware Transformer (CCAT) network to predict the mean opinion score (MOS) of human raters. We evaluate our model on three MOS-annotated datasets spanning multiple languages and distortion types and submit our results to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Softmax · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Absolute Position Encodings