TL;DR
This paper introduces an improved objective quality measure for time-scale modified audio, leveraging handcrafted features and neural networks to predict subjective quality scores with high accuracy.
Contribution
It proposes a novel objective measure using specific features and alignment methods, outperforming previous metrics in predicting audio quality after time-scale modification.
Findings
The measure achieves a mean RMSE of 0.487 and a Pearson correlation of 0.865.
Elastique yields the highest quality for solo and voice signals.
Identity Phase-Locking Phase Vocoder performs best for music signals and overall quality.
Abstract
Objective evaluation of audio processed with Time-Scale Modification (TSM) remains an open problem. Recently, a dataset of time-scaled audio with subjective quality labels was published and used to create an initial objective measure of quality. In this paper, an improved objective measure of quality for time-scaled audio is proposed. The measure uses hand-crafted features and a fully connected network to predict subjective mean opinion scores. Basic and Advanced Perceptual Evaluation of Audio Quality features are used in addition to nine features specific to TSM artefacts. Six methods of alignment are explored, with interpolation of the reference magnitude spectrum to the length of the test magnitude spectrum giving the best performance. The proposed measure achieves a mean Root Mean Squared Error of 0.487 and a mean Pearson correlation of 0.865, equivalent to 98th and 82nd percentiles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
