Deep Learning-Based Single-Ended Objective Quality Measures for Time-Scale Modified Audio
Timothy Roberts, Aaron Nicolson, Kuldip K. Paliwal

TL;DR
This paper introduces two reference-free deep learning-based objective quality measures for time-scale modified audio, utilizing CNN and BGRU architectures to predict subjective quality scores without needing reference signals.
Contribution
It presents novel single-ended deep learning models for TSM audio quality assessment that do not require reference signals, improving evaluation flexibility.
Findings
CNN measure achieves 0.608 RMSE and 0.771 correlation.
BGRU measure achieves 0.576 RMSE and 0.794 correlation.
Both measures effectively evaluate 16 TSM algorithms.
Abstract
Objective evaluation of audio processed with Time-Scale Modification (TSM) is seeing a resurgence of interest. Recently, a labelled time-scaled audio dataset was used to train an objective measure for TSM evaluation. This DE measure was an extension of Perceptual Evaluation of Audio Quality, and required reference and test signals. In this paper, two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Data driven features are created by either a convolutional neural network (CNN) or a bidirectional gated recurrent unit (BGRU) network and fed to a fully-connected network to predict subjective mean opinion scores. The proposed CNN and BGRU measures achieve an average Root Mean Squared Error of 0.608 and 0.576, and a mean Pearson correlation of 0.771 and 0.794, respectively. The proposed measures are used to evaluate TSM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
