AudioVMAF: Audio Quality Prediction with VMAF

Arijit Biswas; Harald Mundt

arXiv:2308.03437·eess.AS·August 8, 2023

AudioVMAF: Audio Quality Prediction with VMAF

Arijit Biswas, Harald Mundt

PDF

Open Access

TL;DR

AudioVMAF extends the VMAF video quality assessment tool with an auditory-inspired frontend to accurately predict coded audio quality, outperforming existing audio metrics and improving correlation with human judgments.

Contribution

This work introduces AudioVMAF, a novel extension of VMAF with an auditory-inspired frontend, significantly enhancing audio quality prediction accuracy.

Findings

01

Outperforms existing visual features adapted for audio quality assessment.

02

Achieves 7.8% and 2.0% improvements in Pearson and Spearman correlations over ViSQOL-v3.

03

Demonstrates the effectiveness of image replication techniques in audio quality prediction.

Abstract

Video Multimethod Assessment Fusion (VMAF) [1], [2], [3] is a popular tool in the industry for measuring coded video quality. In this study, we propose an auditory-inspired frontend in existing VMAF for creating videos of reference and coded spectrograms, and extended VMAF for measuring coded audio quality. We name our system AudioVMAF. We demonstrate that image replication is capable of further enhancing prediction accuracy, especially when band-limited anchors are present. The proposed method significantly outperforms all existing visual quality features repurposed for audio, and even demonstrates a significant overall improvement of 7.8% and 2.0% of Pearson and Spearman rank correlation coefficient, respectively, over a dedicated audio quality metric (ViSQOL-v3 [4]) also inspired from the image domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Image and Signal Denoising Methods · Music and Audio Processing