Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from   Mono InSE-NET

Arijit Biswas; Guanxin Jiang

arXiv:2209.11666·eess.AS·September 26, 2022

Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET

Arijit Biswas, Guanxin Jiang

PDF

Open Access

TL;DR

Stereo InSE-NET extends the mono InSE-NET model to predict stereo audio quality by incorporating spatial cues, achieving significant correlation improvements over existing metrics through transfer learning and augmented training.

Contribution

It introduces a stereo-aware extension of InSE-NET, leveraging transfer learning from mono models and training with real and synthetic data for improved stereo audio quality prediction.

Findings

01

12% improvement in Pearson correlation

02

6% improvement in Spearman correlation

03

Effective transfer learning from mono to stereo models

Abstract

Automatic coded audio quality predictors are typically designed for evaluating single channels without considering any spatial aspects. With InSE-NET [1], we demonstrated mimicking a state-of-the-art coded audio quality metric (ViSQOL-v3 [2]) with deep neural networks (DNN) and subsequently improving it - completely with programmatically generated data. In this study, we take steps towards building a DNN-based coded stereo audio quality predictor and we propose an extension of the InSE-NET for handling stereo signals. The design considers stereo/spatial aspects by conditioning the model with left, right, mid, and side channels; and we name our model Stereo InSE-NET. By transferring selected weights from the pre-trained mono InSE-NET and retraining with both real and synthetically augmented listening tests, we demonstrate a significant improvement of 12% and 6% of Pearson and Spearman…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Music and Audio Processing