Deep Learning-Based Single-Ended Objective Quality Measures for   Time-Scale Modified Audio

Timothy Roberts; Aaron Nicolson; Kuldip K. Paliwal

arXiv:2009.02940·eess.AS·September 9, 2020·1 cites

Deep Learning-Based Single-Ended Objective Quality Measures for Time-Scale Modified Audio

Timothy Roberts, Aaron Nicolson, Kuldip K. Paliwal

PDF

Open Access 1 Repo

TL;DR

This paper introduces two reference-free deep learning-based objective quality measures for time-scale modified audio, utilizing CNN and BGRU architectures to predict subjective quality scores without needing reference signals.

Contribution

It presents novel single-ended deep learning models for TSM audio quality assessment that do not require reference signals, improving evaluation flexibility.

Findings

01

CNN measure achieves 0.608 RMSE and 0.771 correlation.

02

BGRU measure achieves 0.576 RMSE and 0.794 correlation.

03

Both measures effectively evaluate 16 TSM algorithms.

Abstract

Objective evaluation of audio processed with Time-Scale Modification (TSM) is seeing a resurgence of interest. Recently, a labelled time-scaled audio dataset was used to train an objective measure for TSM evaluation. This DE measure was an extension of Perceptual Evaluation of Audio Quality, and required reference and test signals. In this paper, two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Data driven features are created by either a convolutional neural network (CNN) or a bidirectional gated recurrent unit (BGRU) network and fed to a fully-connected network to predict subjective mean opinion scores. The proposed CNN and BGRU measures achieve an average Root Mean Squared Error of 0.608 and 0.576, and a mean Pearson correlation of 0.771 and 0.794, respectively. The proposed measures are used to evaluate TSM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zygurt/TSM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation